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I present a review of the theoretical and computational methodologies that have been used to model the 
assembly of viral capsids. I discuss the capabilities and limitations of approaches ranging from equilibrium 
continuum theories to molecular dynamics simulations, and I give an overview of some of the important con- 
clusions about vims assembly that have resulted from these modeling efforts. Topics include the assembly of 
empty viral shells, assembly around single-stranded nucleic acids to form viral particles, and assembly around 
synthetic polymers or charged nanoparticles for nanotechnology or biomedical applications. I present some 
examples in which modeling efforts have promoted experimental breakthroughs, as well as directions in which 
fT^ , the connection between modeling and experiment can be strengthened. 

o 

(N 
C . 

, CONTENTS 

I. Introduction 

A. Virus anatomies |^ 

B. Virus assembly 
Pq . 1 . Experiments that characterize capsid assembly 

^ ' 2. Motivation for and scope of modeling [sl 

' II. Thermodynamics of capsid assembly |6| 

A. Driving forces |^ 

B. Law of mass action 

1 . Estimating binding energies from experiments |^ 

. III. Modeling Self Assembly Dynamics and Kinetics of Empty Capsids [TTl 

lO ' A. Timescales for capsid assembly [Tl] 

^ : 1 . Scaling estimates for assembly timescales [Tsl 

2. Lag times [l5| 

3. The slow approach to equilibrium [l^ 

B. Rate equation models for capsid assembly 

C. Particle-based simulations of capsid assembly dynamics 

D. Conclusions from assembly dynamics models [l^ 

E. Differences among models |23| 
E Higher T numbers H 

1 . Structural stability of different capsid geometries ll^ 

2. Dynamics of forming icosahedral geometries ll^ 

IV. Cargo-containing capsids Hs] 

A. Structures |25| 

B. The thermodynamics of core-controlled assembly |2^ 

C. Single-stranded RNA (ssRNA) encapsidation ll^ 

D. Dynamics of assembly around cores |2^ 

V. Outlook m 
References H 



o 



- T— I 



* |hagan@brandeis.edu 



2 



I. INTRODUCTION 

The formation of a virus is a marvel of natural selection. A large number ( 60- 1 0,000) of protein subunits and other components 
assemble into complete, reproducible structures, often with extreme fidelity, on a biologically relevant time scale. Viruses play a 
role in a significant portion of human diseases, as well as those of other animals, plants, and bacteria. Thus, it is of great interest 
to understand their formation process, with the goal of developing novel antivirus therapies that can block it, or alternatively 
to re-engineer viruses as drug delivery vehicles that can assemble around their cargo and disassemble to deliver it without 
requiring explicit external control. More fundamentally, learning the factors that make viral assembly so robust could advance 
the development of self-assembling nanostructured materials. 

This review focuses on the use of theoretical and computational modeling to understand the viral assembly process. We begin 
with brief overviews of virus structure, assembly, and the experiments used to characterize the assembly process (section 
We next perform an equilibrium analysis of the assembly of empty protein shells in section |ll] In section we then present a 
simple mathematical representation of the assembly process and its relevant timescales, followed by several types of modeling 
approaches that have been used to analyze and predict in vitro assembly kinetics. We then extend the equilibrium and dynam- 
ical approaches to consider the co-assembly of capsid proteins with RNA or other polyanionic cargoes in section |IV] Finally, 
we conclude with some of the important open questions and ways in which modeling can make a stronger connection with 
experiments. 

In the interests of thoroughly examining the capsid assembly process, this revie w will not consider a number of interesting 
topics such as the structur al dy namics of complete capsids (e.g. lfl2l[89l[l29lll30ll ). capsid swelling or maturation transition s 
ifiol [177U18OI [I82L I217L mi izi), mechanical probing of assembled capsids (e.g. IH [M |5i|93|-|95| 



I2I8II ). or the motor-driven packaging of double-stranded DNA (dsDNA) into assembled procapsids 
reviewed in ll7i lll0ll230ll ). or the conformations of dsDNA inside capsids (e.g. il54l - ll56ll ). 

A. Virus anatomies 

Viruses consist of at least two types of components: the genome, which can be DNA or RNA and can be single-stranded or 
double-stranded, and a protein shell called a capsid that surrounds and protects the fragile nucleic acid. Viruses vary widely 
in complexity, ranging from satell ite tobacco mosaic virus (STMV), whose 1063 -nucleotide genome encodes for two proteins 
including the capsid protein ||220 tl to the giant Megavirus, with a 1,259, 197-bp genome encoding for 1,120 putative proteins 
that is larger than some bacterial genomes and encased in two capsids and a lipid bilayer Viruses such as Megavirus that 
acquire a lipid bilayer coating from the plasma membrane or an interior cell compartment of the host organism are known as 
'enveloped' viruses, whereas viruses such as STMV that present a naked protein exterior are termed 'non-enveloped'. Since 
Steph en Harrison and colleagues achieved the first atomic-resolution structure of tomato bushy stunt virus (TBSV) in 1978 
IIO9II . structures of numerous virus capsids have been revealed to atomic resolution by x-ray crystallography and/or cryo- 
electron microscopy (cryo-EM) images. An extensive collection of virus structures can be found at the VIPERdb virus particle 
explorer website (|http://viperdb.scripps.edu) ll212ll . 



The requirement that the viral genome be enclosed in a protective shell severely constrains its length and hence the number 
of protein sequences that it can encode. As first proposed by Crick and Watson ||62I1 . virus capsids are therefore comprised of 
numerous copies of one or a few protein sequences, which are usually arranged with a high degree of symmetry in the assembled 
capsid. Most viruses can be classified as rodlike or spherical, with the capsids of rodlike viruses arranged with helical symmetry 
around the nucleic acid, such as tobacco mosaic virus (TMV), and the capsids of most spherical viruses arranged with icosahedral 
symmetry. There are also important exceptions discussed below. The number of protein copies comprising a helical capsid is 
arbitrary and thus a helical capsid can accommodate a nucleic acid of any length. In contrast, icosahedral capsids are limited by 
the geometric constraint that at most 60 identical subunits can be arranged into a regular polyhedron. However, early structural 
experiments indicated that many spherical capsids contain multiples of 60 proteins. 

Caspar and Klug proposed geometrical arguments that describe how multiples of 60 proteins can be arranged with icosahedral 
symmetry, where individual proteins interact through the same interfaces but take slightly different, or quasi-equivalent, con- 
formations l'46'l. Protein subunits can be grouped into morphological units or 'capsomers', usually as pentamers and hexamers. 
Icosahedral symmetry requires exactly 12 pentamers, located at the vertices of an icosahedron inscribed within the capsid. A 
complete capsid is comprised of QOT subunits, where T is the 'triangulation number', which is equal to the number of distinct 
subunit conformations. 

In brief, a structure with icosahedral symmetry is comprised of 20 identical facets. The facets are equilateral triangles and thus 
themselves comprise at least 3 identical asymmetric units (asu). The only requirement of the asym metr ic units is that they are 
arranged with threefold symmetry, although many capsid prot eins have a roughly trapezoidal shape ll219ll and it has been argued 
that this shape is ideal for tiling icosahedral structures lll73ll . The Caspar Klug (C-K) classification system can be obtained 
starting from a hexagonal lattice as shown in Fig. [T] An edge of the icosahedral facet is defined by starting at the origin and 
stepping distances h and k along each of the respective lattice vectors. There is an infinite series of such equilateral triangles 




FIG. 1. The geometry of icosahedral lattices. (A) Different equilateral triangular facets can be constructed on a hexagonal lattice by moving 
integer numbers of steps along each of the h and k lattice vectors. (B) Construction of a T=3 lattice. Twenty copies of the triangular facet (left) 
obtained by moving one step along each of the h and k lattice vectors are aiTanged as shown in the middle panel, and then folded to obtain 
the icosahedral structure shown on the right. To connect this construction to a capsid, note that each pentagon will comprise five proteins in 
identical environments and each hexagon will comprise six subunits in two different types of local environments, resulting in a total of 180 
proteins in three distinct local environments. (C) Example icosahedral capsid structures. From left to right are shown the T =l sa tellite tobacco 
mosaic virus capsid (STMV) PDBID 1A34 (HI], the T=3 cowpea chlorotic mottle virus capsid (CCMV) PDBID ICWP dH, and the T=4 
human hepatitis B viral capsid (HBV) PDBID IQGT l27d l. Structures are shown scaled to actual size, and the protein conformations are 
indicated by c olor. In each image the 60 pentameric subunits are colored blue. The images of capsids in (C) were obtained from the Viper 
database 121211 . The images in (A) and (B) were reprinted from J. Mol. Biol., Johnson and Speir, 269, 665-675 (1997) Quasi-equivalent 
viruses: A paradigm for protein assemblies, with permission from Elsevier. 



corresponding to integer values of h and k. The area of such a triangle (for unit spacing between lattice points) is given by T/4, 
where T is the triangulation number defined as 

T = h^ + hk + k^. (1) 

Considering that the smallest such triangle T=l comprises three asu, the total number of asu in the facet is thus given by 3T and 
the total number of asu in the icosahedron is 60T. From Fig. [T]the individual asu's are not all identical for T > 1 since they 
have different local environments. Given the threefold symmetry of the facet, there are T distinct local environments and thus T 
distinct asu geometries. Fig.[Tj3 shows how to build a physical model for such a construction with T=3. 

The asu (i.e. individual proteins) within the icosahedral structure can be grouped based on whether they sit at a five-fold or 
threefold (quasi-sixfold) axis of symmetry into pentameric or hexameric 'capsomers'. Given that an icosahedron contains 12 
vertices with fivefold symmetry and the total number of proteins is given by GOT, there are 10(r — 1) hexamers. 

Many icosahedral viral capsids with T > 1 are comprised of only a single protein copy, meaning that the protein must adopt 
different configurations depending on its local environment. It was originally proposed by Caspar and Klug ll46ll that because 
the local environments are similar, or 'quasi-equivalent', the proteins in different environments could interact through the same 
interfaces. This has since been found to be correct for many icosahedral viruses, with structural differences between proteins at 
different quasi-equivalent sites often limited to loops and N- and C-termini. However, there can also be proteins with extensive 
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confor mational ch anges or even different sequences at different sites. Some examples of these structural differences are reviewed 
inRefs.millli. 

Some icosahedral virus capsid structures deviate from the class of lattice structures shown in Fig. [T] For example, the Poly- 
omaviridae, e.g. human papilloma virus (HPV), are comprised entirely of pentamer capsomers, which depending on their local 
environment are either five-fol d or six-fold coordinated. Generalizations of the C-K classification scheme have been proposed 
I253II which can describe polyom avirus cap sid shapes. Mannige and Brooks identified a relationship 
between hexamer shapes and capsid properties such as size lll73lll74ll . They also developed a metric for complexity of icosahe- 
dral morphologies, which resulted in a 'periodic table' of capsids ar id, co mbined with the assumption that the simplest structures 
are the fittest, revealed evolutionary pressures on capsid structures lll75ll . 

There are also non-spherical capsids with aspects of icosahedral symmetry. For example, the mature HIV virus capsid assem- 
bles into tubular or conical shapes ||29[_9^93| and some bacteriophages (viruses that infect bacteria) have capsids which are 



elongat ed or prolate icosahedra (e.g. 1841 " 185 1). The C-K classification system was extended to describe prolate icosahedra by 



Moody 11 1 8511 . We present some approaches to model the stability and formation of capsids that correspond to C-K structures or 
their generaUzations in section HITFI 



B. Virus assembly 

Viral assembly most generally refers to the process by which the protein capsid(s) form, the nucleic acid becomes encapsulated 
within the capsid, membrane coats are acquired (if the virus is enveloped), and any maturation steps occur For many viruses the 
capsid can form spontaneously, as demonstrated in 1955 by the experiments of Fraenkel-Conrat and Williams in which the RNA 
and capsid protein of TMV spontaneously assembled in vitro to form infectious virions I188II . 

The pathway of nucleic acid encapsulation differs dramatically between viruses with single-stranded or double-stranded 
genomes. Viruses with single-stranded genomes (the best studied of which have ssRNA genomes) usually assemble sponta- 
neously around their nucleic acid in a single step. This category includes many small spherical plant viruses (e.g. STMV or 
bromoviridae), the bacteriophage MS2, and animal viruses such as nodavirus. In many cases the RNA is required for assembly 
at physiological conditions, but the capsid proteins can assemble without RNA into empty shells /« vitro under different ionic 
strengths or pH. We also can include in this group the Hepadnaviridae family of viru ses (e.g. H epatitis B Vims (HB V)), which 
have a dsDNA genome but a capsid that assembles around an ssRNA pregenome ll22ill 16lll33ll . 

The extreme stiffness of a double-stranded genome (the persistence length of dsDNA is 50 nm) and the high charge density 
preclude spontaneous nucleic acid encapsidation. Thus packaging a double-stranded genome requires a two-step process in 
which an empty protein shell is assembled followed by packaging via ATP hydrolysis and/or complexation with nucleic acid 
folding proteins (e.g. histones llOll l262ll ). Of these viruses, the assembly processes have been most thoroughly investigated 
for dsDNA viruses, such as the tailed bacteriophages, herpes virus and adenovirus. These viruses assemble an empty capsid, 
without requiring a nucleic acid at physiological conditions, and a molecular motor which inserts into one vertex of the capsid 
II242]. The motor then hydrolyzes cellular ATP to pump the DNA into the capsid. 

In this review we will focus on the assembly of icosahedral viruses, first discussing the assembly of empty capsids such as 
occurs during the first step of bacteriophage assembly, and then co-assembly of capsid proteins with RNA, such as occurs during 
replication of ssRNA viruses, and finally co-assembly with other polyanions in in vitro experiments. We will not consider the 
assembly of rod-like viruses. Although not yet completely understood, the ass embly process for the rod-like virus TMV has 
been studied in great detail and has been the subject of several reviews ||42[ l45l Il44ll as well as more recent modeling studies 



1. Experiments that characterize capsid assembly 



The kinetics for spontaneous capsid asse mbly in vitro have been m easured with size exclusion chromatography (SEC) and 
X-ray and fight scattering (e.g. Ill HIH [US [lil |205[ |289[ I29TI1 ). Most frequendy, the fraction of subunits in capsids or 
other intermediates has been monitored using size exclusion chromatography (SEC) and the mass-averaged molecular weight 
has been estimated with light scattering. The SEC experiments show that under optimal assembly conditions the only species 
present in detectable concentrations are either complete capsids or small oligomers which we refer to as the b asic assembly 
unit. The size of the basic assembly unit is virus dependent; e.g. dimers for bromoviruses ^ and HBVll47ll275ll. or pentamers 
for picornaviruses (e.g. human rhinovirus (HRV)) and the polyomaviridae family [15^ (e.g. human papilloma virus (HPV)). 
Provided that intermediate concentrations remain small, the mass-averaged molecular weight and thus the light scattering closely 
track the fraction capsid measured by SEC. Example light scattering measurements from Zlotnick and coworkers 02911 are shown 
in Fig. [2}\ for HBV assembly at several ionic strengths. 

While these bulk experiments have provided tremendous information about capsid assembly kinetics, it has been difficult to 
characterize assembly pathways in detail because the intermediates are transient and present only at low levels. Complementary 
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FIG. 2. (A) Light scattering measured as a function of time for 5 fiM dimer of HBV capsid protein at indicated ionic strengths. Light scatter 
is approximately proportional to the mass-averaged molecular weight of assemblages and, under conditions of productive assembly, closely 
tracks the fraction of subunits in capsids (see text). (B) Simulated light scattering for 5 /iM subunit with indicated values of the subunit-subunit 
binding free energy (gb) using the rate equation approach described in section UlIB I Figures reprinted with permission from Biochemistry, 38, 
14644-14652 (1999), A Theoretical Model Successfully Identifies Features of Hepatitis B Virus Capsid Assembly , Zlotnick, Johnson, Wingfield, 
Stahl, Endres, Copyright (1999) American Chemical Society. 



techniques have begun to address this Hmitation. Restive pu lse sensing was used to track the passage of individual HBV 
capsids through conical nanopores in a membrane lllOSi l285ll . This Coulter-counter-like apparatus was able to distinguish 
between r=3 and T=A c apsids. Mass spectrometry has been used to characterize key intermediates in the assembly of MS2 by 
Stockley and coworkers ll23l |239[ l249ll (see section HIl F 2l l and for HBV and nodavirus by Uetrecht et al. ll255ll . Furthermore, 
fluorescent labeling of c aps i d pro teins lll3lll and in some cases RNA has enabled measuring assembly timescales for capsids in 
vivo (reviewed in Refs. ll24[|l32ll ). 



2. Motivation for and scope of modeling 

Even with the experimental capabilities to detect and characterize key intermediates for some viruses, theoretical and compu- 
tational models are important complements to elucidate assembly pathways and mechanisms. Each intermediate is a member of 
a large ensemble of structures and pathways that comprise the overall assembly process for a virus. Furthermore, assembly is 
driven by collective interactions that are regulated by a tightly balanced competition of forces between individual molecules. It is 
difficult, with experiments alone, to parse these interactions for those mechanisms and factors that critically influence large-scale 
properties. With a model, one can tune each factor individually to learn its affect on the assembly process. In this way, models 
can be used as a predictive guide to design new experiments. However, whether at atomistic or coarse-grained resolution, models 
involve important simplifications or other inaccuracies in their representation of physical systems. Thus, comparison of model 
predictions to experiments is essential to identify and then refine important model limitations. Iterative prediction, comparison, 
and model refinement can identify the key factors that govern assembly mechanisms. 

The large ranges of length and time scales (A-^m, ps-minute) that are relevant to most capsid assembly reactions hinder 
simulating the process with atomic resolution, although Freddolino et al. lf89ll performed an all-atom simulation of the intact 
STMV virus. Recently, approach es to syste matically coarse-grain from atomistic simulations have been applied to interrogate 
the stability of intact viruses lfl2i Il29[, | l30ll or to estimate subunit positions and orientations from cryo electron microscopy 
images of the immature HIV capsid lisll . All-atom molecular dynamics has been applied to specific elements of the assembly 
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reaction ll278ll . As we describe below, most efforts to model capsid assembly to date have considered simplified models which 
retain those aspects of the physics which are hypothesized to be essential; with the validity of the hypothesis to be determined 
by comparison of model predictions with experiments. 



II. THERMODYNAMICS OF CAPSID ASSEMBLY 

We will begin our discussion of viral assembly by analyzing the formation process of an empty capsid. While this process is 
most relevant to viruses that first form empty capsids during assembly, ssRNA capsid proteins have also been examined with in 
vitro experiments in which the ionic strength and pH were adjusted to enable assembly of empty capsids. 



A. Driving forces 

For assembly to proceed spontaneously, states with capsids must be lower in free energy than a state with only free subunits. 
The assembly of disordered subunits (and RNA or other components if applicable) into an ordered capsid structure reduces their 
translational and rotational entropy, and thus must be driven by favorable interactions among subunits and any other components 
that overcome this penalty. We begin here with the protein-protein interactions; the subunit-RNA interactions that promote 
ssRNA capsids to assemble around their genome are discussed in section|IV]and also reviewed in great detail by Siber, Bozic, and 
Podgomik ll230ll . Capsid proteins are typically highly charged and possess binding interfaces that bury large hydrophobic areas. 
Thus, as with most protein-protein interactions H, capsid assembly results from a combination of hydrophobic, electrostatic, 
van der Waals, and hydrogen bonding interactions. Covalent interactions typically do not play a r ole in assembly, although they 
appear during subsequent maturation steps for a number of viruses (e.g. the bacteriophage HK97 l27l[l ). 

Importantly, all of these interactions are short-ranged under assembly conditions. Van der Waals interactions and hydrogen 
bonds operate on length scales of a few angstroms. Electrostatic interactions are screened on the scale of the Debye length, 

1 /2 

Ad ~ O-S/C'sait , with Ad measured in nanometers and the salt concentration Csait measured in molar units. At physiological 
ionic strength, Csait = 0.15 M, the screening length is Ad ~ 1 nm; in vitro experiments typically occur within the range [0.0 5, 1] 
M. The hydrophobic interactions are similarly characterized by a length scale of approximately a 0.5 — 1 nm ll48ill98ll278ll . 

In many cases the interaction is primarily driven by hydrophobic interactions, attenuated by electrostatic interactions with 
directional specificity imposed by van der Waals interactions and hydrogen bonding at A length scales. The importance of 
hydrophobic interactions and the sometimes antagonistic contributions of direct electrostatic interactions have been shown by 
measuring the dimerization affinit y of t he C-terminal domain of the HIV capsid protein under an extensive series of mutations to 
the dimerization interface ||64[ l65lTl60ll . Furthermore, Ceres and Zlotnick IMtIi showed that the thermodynamic stability of HBV 
capsids increases with both temperature and ionic strength. The increase in stability with temperature suggests that hydrophobic 
interactions are the dominant driving force [48:]. The increase in stability with ionic strength, on the other hand, suggests that 
the salt screens repulsive electrostatic interactions which oppose protein association. Several models based on this hypothesi s 
reproduce the dependence of protein-protein interaction strength on ionic strength measured in the experiments lll39l I203ll230|] . 
However, it is worth noting that the experiments were performed on capsid protein with the highly charged C-terminal domain 
truncated, and it is difficult to pinpoint on the crystal structure which charges are responsible for repulsive interactions. Ceres 
and Zlotnick (4T\\ suggested that higher salt concentrations could enhance assembly by favoring a capsid protein conformation 
which is active for assembly. 



B. Law of mass action 

We now consider the assembly thermodynamics for subunits endowed with the interactions just described. We begin by 
considering the equilibrium for a system of identical protein subunits assembling to form empty T=l capsids. To make the 
calculation analytically tractable, we assume here that there is one dominant intermediate species for each number of subunits n; 
extending this assumption is conceptually straightforward. The word subunit refers to the basic assembly unit defined in section 

The total free energy Fec for a system of subunits, intermediates, and capsids in solution can be written as 

N 

FEc/k^T ^ ^B^/'" [log(p„fo) - 1] + PnG7 (2) 

n=l 

where vq is a standard state volume, p„ is the density of intermediates with n subunits, and G"'' denotes the interaction free 
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energy of such an intermediate. A plausible model for the interaction free energy is 

n 

G?:P(ffb)=E(n55b)-T5^™ (3) 



where rf- is the number of new subunit-subunit contacts formed by the binding of subunit j to the intermediate, is the free 
energy for such a contact, and 5''*'^^'^" accounts for degeneracy in the number of ways subunits can bind to or unbind from an 
intermediate (see the s factors in Refs. [80, 286| and Fig. |3]l. These terms a re specife d by the geometry of the capsid. Here we 
have subsumed rotational binding entropy penalties into (see Ref. ll28l[8ll[T05lll06ll ) and, to reduce the number of parameters, 
assumed that the binding energy gb is the same for all contacts. As discussed in section UTaI gh depends on temperature, ionic 
strength, and pW. Eq. [3] can be readily extended to allow for interface-dependent contact energies and subunit conformational 
changes iItsIi . 

To obtain the equilibrium concentration of intermediates we minimize Fec subject to the constraint that the total subunit 
concentration px is conserved: 

N 

^npn^pj. (4) 

n=l 

This yields the well-known law of mass action (LMA) result for intermediate concentrations ll40l l223ll 

PnVo = cxp[-/3(G;"P - n/i)] 

p = kBTlog{vopi) (5) 

with p the chemical potential of free subunits and (3 = l/k^T. Due to the constraint (Eq. |4|l Eq. [5]must be solved numerically. 
The result for a model dodecahedral capsid comprised of 12 pentagonal subunits is shown in Fig. [3]for several values of the 
binding energy gb- Notice that in all cases the capsid protein is almost entirely sequestered either as free subunits or in complete 
capsids. This prediction, which is analogous to the result for spherical micelles with a preferred diameter [l223l is generic to any 
description of an assembling structure in which the interaction free energy G'^n'^ is minimized by one intermediate size (n = N) 
and the total subunit concentration is conserved. 

To emphasize the generic nature of the prediction that intermediate concentratio ns are negligible at equilibrium, we also 
consider a continuum model of an assembling shell presented by Zandi and coworkers ll280ll . Each partial-capsid intermediate is 
described as a sphere, with a missing spherical cap. The unfavorable free energy due to unsatisfied subunit-subunit interactions 
at the perimeter of the cap is represented by a line tension cr, so the interaction free energy for a partial capsid with n subunits is 

G'="P(n) = 7ig,, + CT;(7i) -6 (6) 

with the perimeter of the missing spherical cap for a given by 

l{n) = 27ri?sine'(n) = vy^2[TTn{N - n)/NY/'^ (7) 

1/3 

with Wq the size of one subunit and g^ the binding free energy per subunit (not per contact) in a complete capsid. Finally, we 
have included & = gs + 2(t/(1) to ensure that free subunits have no interaction energy, since the continuum model breaks down 
for small intermediates. We set the line tension to cr = — .gs/2, which indicates that, on average, a subunit adding to the perimeter 
of the capsid satisfies half of its contacts. The resulting profile for G{n) is shown in Fig.|4]3, with the intermediate concentrations 
for several values of pj/ p* shown in Fig.|4};. In all cases, the intermediate concentrations are negligible. 

Two-state approximation. Based on the observation that intermediate concentrations are negligible at equilibrium, the equa- 
tions for capsid assembly thermodynamics can be simplified considerably by neglecting all intermediates except free subunits or 
complete capsids, so that 

Pt ^ pi + NpN- (8) 
Defining the fraction of subunits in capsids as /c ~ N p^ / pj, combining Eqs.[8]and|5j and rearranging, we obtain ll26lll 

/c 



In the limit ^ 1 this gives 



l-/c 

rl/N 

fc' Pi 



Nivo^TY^-'e-"'-'^. (9) 



1 - /c P* 

p*vo = exp (/3^ j A^-i/(A^-i) « exp (PG^/N) (10) 
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FIG. 3. (A) The assembly model for a dodecahedral capsid and the statistical weights associated with symmetries for the intermediates. The 
columns list respectively the number of intermediates, the lowest energy configuration, the degeneracy for adding an additional subunit (s„ in 
Eq|20]below), the degeneracy for losing a subunit (s„ in Eq|20t. the net degeneracy (55'/°'^" in Eq. [3j, the number of contacts gained by adding 
a subunit(nj i n Eg . [3]l, and the corresponding equilibrium constant. Only the first four and last two intermediates are shown; the full set are 
given in Ref. 128611 . (B) The mole fractions of each intermediate calculated using Eq. |5]and the statistical factors in (A) are shown for total 
subunit concentrations pT of 0.44/iM (□), 0.88^M (A), and 1.8^M (•). Figures reprinted from J. Mol. Biol., 241, Zlotnick, To Build a Virus 
Capsid: An Equilibrium Model of the Self Assembly of Polyhedral Protein Complexes, 59-67 Copyright( 1 994) with permission from Elsevier. 



with p* the pseudo-critical subunit concentration. In the asymptotic limits Eq. [TO]reduces to 

/c«(^j «1 forpT«P* 

forpT>/3* (11) 

The solution to Eq. [TT]is shown in Fig. |5]for several values of the capsid size N; note that the transition becomes sharper with 
increasing capsid size. Also notice that increasing the total subunit concentration pj or the magnitude of the binding energy (i.e. 
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FIG. 4. (A) Depiction of thie continuum model description of partial capsid intermediates considered by Zandi et al ||280|| . R is the radius of the 
capsid and the angle characterizes the extent of completion of the capsid. (B) Interaction free energy G{n) as a function of intermediate size 
n obtained from Eq|6] (C) Predicted mole fractions using Eq.|6]and Eq.|4]for pj = p* (o), px ~ 2p* (□), and px ~ 5p* (o). = — IS/cbT 
for (B) and (C). 



decreasing p*) always increases the fraction of subunits in complete capsids /c at equilibrium. We will see however in section 
|lll]that this trend does not always apply at finite but experimentally relevant timescales due to kinetic effects. 

Higher T numbers. If one or a few ground state capsid geometries are known (or pre-assumed), the thermodynamic cal- 
culation described above can be extended to describe capsids with larger T numbers in a straightforward manner Recalling 
from section ITAI that icosahedral capsids comprise T different subunit conformations (or in some cases protein sequences), the 



capsid free energy G^'' must be extended to include conformation energies and contact free energies which depend on the 
subunit conformation or species iItsIi . Approaches to determine the lowest free energy configuration(s) for a shell are discussed 
in section nil F II 



1. Estimating binding energies from experiments 

Zlotnick and coworkers have shown that the assembly of HBV ll47ll can be captured by Eq. [TO] using as a fit parameter 
(see Fig. |6j. These fits yield an important observation that the subunit-subunit binding free energies are quite small, on the 
order of ~ 4 kcal/mol {6.7kBT) for productive assembly reactions. The obs ervat ion that capsid assembly is driven by weak 
interactions of this magnitude appears to be a general rule for capsid assembly ll287ll . for reasons discussed in section Hill 

The conclusion that most of the interactions driving capsid assembly are weak appears to be broadly vahd. However, it is 
important to note that Eq. [TO]is an equilibrium expression, and thus strictly applies only on times exceeding any relevant reaction 
timescale. We can immediately see that this condition is beyond the reach of many experiments by estimating the timescale for a 
single sub unit to leave an assembled capsid. Consider a typical subunit-subunit association rate constant of / = 10^/M • s ( Isol 
Il26ll29lll . see section lUlBI ). and a typical binding free energy of gb = Q.lk^T. Since the dimer subunits of HBV are tetravalent, 
the first subunit must break four contacts to dissociate, with a timescale of about tdissociate ~ /cxp(4gb/fcBr) = 50 days. 
Similarly, we show in section IHl A ll that the approach of assembly toward equilibrium must lead to ever increasing nucleation 
barriers. Based on dynamical assembly simulations, our group has estimated that the values of could be underestimated by 
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FIG. 5. Fraction capsid /c as a function of subunit oversaturation pj p* predicted by Eq. [TO]for the number of subunits in a complete capsid 
N = 12, 60, and 1000. 




FIG. 6. (A) Fraction capsid measured for assembly of empty HBV capsids from capsid protein in which the RNA binding domain has been 
truncated, Cpl49, using SEC as a function of total dimer subunit concentration [CP149]totai. Results are shown for indicated salt concentrations, 
and the lines are fits to the equilibrium model with G''^ = 240(7b — T log(sjv) assuming four contacts per subunit and using the contact energy 
as a salt concentration dependent fit parameter, with the symmetry number of the complete r=4 capsid as sjv = 2^^®/120 l47ll . (B) 
Estimated values of gb as a function of temperature and ionic strength. Reprinted with permission from Ceres and Zlotnick, J. Mol. Biol., 
41, 1 1525-1 1531 (2002), 'Weak Protein-Protein Interactions Are Sufficient To Drive Assembly of Hepatitis B Virus Capsids, Copyright (2002) 
American Chemical Society. 
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about fefiT even for measurements taken at 24 hours due to this effect. 

The actual timescales for subunit dissociation from complete capsids can be estimated fro m ex periments that labeled subunits 
to monitor exchange with complete P22 capsids lll96ll as well as T=3 and T=4- HB V capsids i256ll . Subunit exchange on a period 
of days to months was indeed demonstrated for the P22 capsids and a fraction of the subunits in the T=3 HBV capsids. However, 
no subunit exchange was observed for T=4 HBV capsids, even when temperature was decreased to 4° C (recall that HBV is 
less stable at lower temperature). Similarly, Singh and Zlotnick i234ll measured substantial hysteresis for the dissociation of 
HBV capsids under denaturant. These observations raise the possibility that there are some steps which are irreversible (at least 
on measurement timescales) in the assembly process. Irreversible steps late in assembly or during a post-assembly maturation 
process make sense from the perspective of virus replication, as they would extend the period of time over which the virus can 
remain complete in infinite dilution and unfavorable environments. Of course, there must be a mechanism to release the genome 
once the virus has infected a host. 

The existence of irreversible steps cannot be directly revealed by assembly data alone. It has been shown that, even if 
there are assembly steps which are irreversible (on relevant timescales) late in the assembly cascade, as long as most steps are 
reversible the assem bly data can be fit to Eq{TO]with an apparent value of gb reflecting the free energy of the reversible steps 
IMIiMIIilllll. Similarly, comparison of the dynamical equations described in section HIlBI to kinetics data could only 
reveal the presence of irreversible steps on timescales exceeding the equilibration time associated with the reversible steps (e.g. 
> 50 days). 



III. MODELING SELF ASSEMBLY DYNAMICS AND KINETICS OF EMPTY CAPSIDS 

The experimental measurements of capsid assembly kinetics described in section ITB 1 [ provide important constraints on mod- 
els of capsid assembly kinetics. At the same time, they present an important opportunity for modeling; because only some 
intermediates can typically be characterized, models are essential to understand detailed assembly pathways. In this section we 
describe different modeling approaches which have been used to predict or understand the assembly kinetics. 



A. Timescales for capsid assembly 

We begin our description of capsid assembly kinetics by defining the potential rate limiting steps and presenting scaling 
estimates for their timescales. While our estimates are based on simplified models, we will see in the subsequent sections that 
many of the predictions remai n app licable when additional details are accounted for 

It was noted by Prevelige ll205ll that assembly kinetics for icosahedral capsids can be described in terms nucleation and 
elongation (or growth) timescales, closely analogous to crystallization. Nucleation refers to formation of a 'critical nucleus', or 
a structure which has a greater than 50% probability of growing to a complete capsid before disassembling. Elongation then 
refers to the timescale required for a critical nucleus to assemble into a complete capsid. In contrast to crystallization, there can 
be a well-defined elongation timescale since capsids terminate at a particular size. 

Nucleation. For any type of spheroidal shell, including an icosahedral capsid, the first subunits to associate create fewer 
interparticle contacts than those associating with larger partial capsids (see Figs. |3}^ and|4}3). Under conditions which lead to 
productive assembly the subunit-subunit binding free energy (gb) is weak (see Fig.|6l). Thus the favorable free energy of these 
contacts is insufficient to compensate for the mixing and rotational entropy penalties incurred upon association, and the small 
intermediates are unstable. However, at subunit concentrations above the pseudocritical concentration p* there must exist a 
critical size above which there are sufficient interactions such that further assembly is more probable than dissociation. In fact, 
the number of interactions depends on the partial capsid geometry, and thus there is an ensemble of critical nuclei with different 
sizes. 

It is often assumed that the dominant assembly pathways pass through one or a few critical nuclei with the smallest sizes and 
thus the assembly probability can be approximated by a single valued function of partial capsid size n (i.e. n is a good reaction 
coordinate ||66ll ). Then, the critical nucleus corresponds to a maximum in the grand free energy, defined as il„ = Gn ~ l^n, with 
Gn the Gibbs free energy for an intermediate with n subunitsn and fi the chemical potential. For the thermodynamic models of 
partial capsids presented in section BTBI the grand free energy is given by 

r!„ = G5fP-A:BTnlog(pit.o). (12) 

with Gn''' the interaction energy for a partial capsid intermediate with n subunits and pi the free subunit concentration. 

Th e effe ct of the shell geometry on the critical nucleus size can be understood elegantly from the continuum model of Zandi 
et al. l280ll in which partial capsid intermediates are described as spheres missing hemispherical caps with the partial capsid 
interaction free energy C^^^ given by Eq.|6] The critical nucleus is then calculated as the maximum in (Eq. [T2] ) to give 
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FIG. 7. Image of the CCMV pentamer of dimers that experiments 128911 indicate is the critical nucleus. Atoms are shown in van der Waals 
representation and colored according to their quasi-equivalent con form ation, with A monomers in blue and B mon omers in red. The coordinates 
were obtained from the CCMV crystal structure, PDBID ICWP f23§} using the Viper oligomer generator ll212ll and the image was generated 
with VMD Ii l23i1. 
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FIG. 8. The grand free energy as a function of intermediate size for different free subunit supersaturation values (pi/ p") as calculated by the 
continuum model for a capsid with 90 subunits. These curves would comspond to the free energy profiles at increasing times for a reaction 
which begins with p\ = 3p* and proceeds toward equilibrium with p\ = p* . 



"n- = 0.5A^ - (JTJ^^) (13) 

with r = [ps — ln(piwo)]/o-. Notably, the critical nucleus size depends on the binding energy and subunit concentration, and 
decreases with increasing supersaturation of free subunits {pi/ p*). Plots of f2(n) are shown for several values of supersaturation 
inFig.lUlAl 

The free energy forms for models which account for the icosahedral geometry of capsid structures are similar to the continuum 
model just described, except that the critical nucleus tends to correspond to a small polygon, which is a local minimum in 
the free energy since it corresponds to a local maximum in the number of subunit-subunit contacts (see Fig. O. Although 
the assumption that there is one dominant intermediate per partial capsid size is an oversimplification, simulations |106, 20i 
and theory iItqI I184|1 indicate that under many conditions assembly pathways predominantly pass through only a few nucleus 
structures which correspond to co mpletion o f small polygons. Measured critical nucleus sizes under simulated conditions have 
ranged from 3-10 subunits iItI |76[ [l4ll llosll . 

Experime ntall y, nucleation has also been shown to correspond to completio n of polygons, such as the pentamer of dimers 
for CCMV ll289ll shown in Fig. |7]or a trimer of dimers for turnip crinkle vims ll237ll . However, it is likely that intertwining of 
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flexible terminal arms and other subu nit co nformation changes can provide additional stabilization u pon p olygon formation. In 
the case of MS2, mass spectrometry ll239ll identified two polygonal intermediates, which modeling lll87ll suggested were each 
critical nuclei for a different assembly pathway, with the prevalence dictated by binding to RNA. Most computational simulations 
of icosahedral capsids to this point have not incorporated allostery. Including stabilization due to polygon- or RNA-associated 
allostery could enable a particular structure to remain as the predominant critical nucleus over a wider range of interaction 
strengths and subunit concentrations than is predicted by more basic models. 

Elongation. The association of subunits after nucleation has been described as elongation or growth. In contrast to the 
transient intermediates found below the critical nucleus size(s), intermediates in the growth phase are stable. Simulations indicate 
that association usually proceeds by the sequential addition of one or a few subunits at a ti me, although b inding of larger 
oligomers can be significant at high concentrations or fo r som e subunit interaction geometries lll05l l243ll283ll . Association of 
large oUgomers can also misdirect the assembly process ll268ll (section BllCI ). 



1. Scaling estimates for assembly timescales 

To facilitate the presentation of how the timescales of nucleation and growth depend on system parameters, we first consider a 
highly simplified assembly reaction. I t was shown that the conclusions from this simplified reaction remained valid when more 
sophisticated models were considered lll07ll . 

We consider a system of capsid protein subunits with total concentration pj that start assembling at the time t = 0. Our 
reaction is given by: 

1 ^ 2 ^ • ■ ■ ^ n^uc ■ ■ ■ N (14) 

bnut &niic bnut bcluni! belong 

where N is the number of subunits in a capsid, pi is concentration of unassembled subunits, bi is the dissociation rate constant 
(with i = {nuc,elong}), which is related to the forward rate constant by the equilibrium constant, vobj = /exp {(igi), with 
gi the change in interaction free energy upon subunit association to a partial capsid and wq the standard state volume. The 
nucleation and elongation phases are distinguished by the fact that association in the nucleation phase is not free energetically 
favorable, pi exp(— /J^nuc) < 1, while association in the elongation phase is favorable, pi exp(— /3geiong) > 1- Similar results 
can be obtained by assuming that the forward rate constant differs between the two phases Ii29 li1 . For the moment, we assume 
that there is an average nucleus size rinuc- 

We write the overall capsid assembly time t as 

'Hiuc^t'Teiong; (15) 

with Tnuc and Tgiong the average times for nucleation and elongation, respectively. The timescale for the elongation phase can be 
calculated as the mean first passage time for a biased random walk with reflecting boundary conditions at rinuc and absorbing 
boundary conditions at N, with forward and reverse hopping rates given by f pi and &eiona, respectively. This gives ll2lll 

^ belong f belong j / belong | ^ (16) 

" °"° fPl - belong V fPl - belong / \ f Pi ) 

with rieiong = N — rinuc- In the limit of fpi ^ 6eiong Eq.[T6]can be approximated to give feione ~ "-eiong/Zpi, while for similar 
forward and reverse reaction rates, fpi w 6eiong, it approaches the solution for an unbiased random walk ieiong ~ '^dong/^/Pi- 

Under conditions of constant free subunit concentration pi, we can derive the average nucleation time with an equation 
analogous to Eq.M iMITol 

^niic 1 1 ^nuc \ I ^nuc 

fPl - bnuc V fPl - bnuc J \ f Pi 

cxp {GZ,_,/ksT) p-"- = l/fpi cxp (-17„„„^_i/fcBT) (17) 

This equation can be understood as follows. Because nucleation is rare on timescales of individual subunit binding events, 
pre-nuclei reach a quasi-equilibrium with concentration p„ = p" cxp[— /3Gn'''] (see (|5])). The nucleation rate, t^:^^ is then given 
by the rate of subunits adding to the largest pre-nucleus nnuc — 1, t~;!^ = /piPn„„c-i which gives the second line of ( [TT] ). A 
comparable expression is derived under a continuum limit in Ref. l280ll in which the timescale for a subunit to associate with 
the critical nucleus is replaced by a critical nucleus survival timescale. 

However, because free subunits are depleted by assembly, the net nucleation rate never reaches this value and asymptotically 
approaches zero as the reaction approaches equilibrium. Instead, treating the system as a two-state reaction with n„ 
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FIG. 9. The scaling expression for the median assembly time ri/2 as a function of subunit concentration predicted by Eq. \TS\is compared to 
full numerical solutions of the rate equations Eq. [T4](see section ITlIBb . The numerical results are shown for completion fraction /c (□) and 
estimated light scattering (+), while the theoretical prediction Eq. [T8]is shown as a dashed line. The estimate for the crossover concentration 
Pc (Eq. |19t above which the light scattering and completion fraction do not match is shown with a • symbol, and the concentration at which 
the monomer starvation kinetic trap increases overall assembly times pki is shown as a ■ symbol. Parameter values are gnm = TksT (« 4 
kcal/mol) j47ll . gdong = Sgnuc, On ~ 4(7nuc, capsid size N = 120 corresponding to 120 di mer s ubunits in a Hepatitis B Vi rus c apsid j47ll . the 
critical nucleus size rinuc = 5, and the subunit association rate constant / = 10^ M^^s~^ Il26ll . Based on data from Ref. llOTll . 



kinetics yields an approximation for the median assembly time ri/2, the time at which the reaction is 50% complete llOTll 

mi ~ ^i:77 cxp (G„„„_i/fcBr) Po (18) 



with /c'' as the equilibrium fraction of subunits in complete capsids, which can measured experimentally ll47ll . The factor of 
in Eq. [TS] accounts for the fact that N subunits are depleted by each assembled capsid. This prediction is compared to 
simulated assembly times in Fig.|9] 

Kinetic trap. We found that the relationships between Teiong and Eq. [18] and assembly times begin to fail at a crossover 
concentration pc for which the initial rate of subunit depletion by nucleation (A^/rnuc) is equal to the elongation rate. For larger 
subunit concentrations, new nuclei form faster than existing ones complete assembly, and free subunits are depleted before most 
capsids finish assembling. The system then becomes kinetically trapped at a larger concentration p^t defined by the point at 
which the median assembly time ri/2 matches the elongation time. These concentrations are related to binding free energies and 
other parameters by 

Teiong « Tnuc/iV for pj = 

Teiong « Ti/2 for pj = Pkt (19) 

with Tnuc and T1/2 respectively given by Eq.[T7]and Eq.fTS] 

A kinetic trap arising fro m de pletion of f ree s ubunits (Eq.([T9ll) was first noted by Zlotnick ll80ll286l l29lll and was observed 
in ex periments on CCMV ll289ll and HBV l29lll (see the largest ionic strength in Fig. |2]\). Morozov, Bruinsma, and Rudnick 
il86ll elegantly recast a model similar to Eq. (fl4] i in a continuum description, within which the time evolution of concentrations 
of capsid intermediates resembles a shock wave. If the wave does not reach the size of a complete capsid before free subunits 
are depleted then the system is trapped. 

While the continuum model correctly predicts the presence of the free subunit depletion trap, the computer simulations 
described in sections Fill Bl and IlII CI show that productive capsid assembly reactions do not resemble a Shockwave. Because 
nucleation is a stochastic event, each capsid elongation process starts at a different time; i.e., they are out of phase. For pj < 
Pkt relatively few capsids are assembling at any given time, and thus intermediate concentrations remain at low levels. The 
Shockwave could only arise in the limit of Tnuc ^ Teiong, in which case the system would be severely trapped. This trap can be 
avoided though for reactions in which subunits assemble around RNA or nanoparticles (section [TVl . provided that subunits are 
in excess. 
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FIG. 10. The lag time is related to the mean elongation time. (A) Completion fractions /c measured from Brownian dynamics simulations 
of a particle based model (section llllCI l are shown as a function of time for indicated total subunit volume fractions (ut). (B) The duration of 
the lag phases from the simulations shown in (A) are compared to mean capsid elongation times. The crossover volume fraction Vc estimated 
from Eq.[T9]is shown as a • symbol. The plotted data is from Ref. lIlOTll . 



2. Lag times 



A distinctive feature of many capsid assembly kinetics measurements is an initial lag phase before detectable assembly occurs 
(e.g. Fig. |2]i whose duration decreases with increasing subunit concentration or subunit-subunit binding free energy. Although 
Zlotnick and coworkers |80, 286, 291] showed that partial capsid intermediates assemble during the lag phase, it has often been 
assumed that the duration of the lag phase corresponds to the time required for the concentration of critical nuclei to reach steady 
state, in analogy to models of actin nu cleat i on. H owever, the simple model described above can be solved exactly in the limit 
of constant free subunit concentration lll03[ Il07ll . in which case the lag phase in /c is equal to the mean elongation time Teiong 
estimated in the previous section. Because the free subunit concentration is nearly constant during the lag phase under most 
conditions, this relationship holds even when the assumption of constant free subunit concentration is relaxed. 

To illustrate this relationship, mean elongatio n tim es and lagtimes calculated from Brownian dynamics simulations of a 
particle-based model (see section ITlI CI and Ref llOTll ) are shown in Fig. [TO] We see that the correspondence is excellent 
until the reaction approaches the crossover concentration (estimated from Eq. [T9] l. Th is corresp ondence could be tested 
experimentally by comparing elongation times measured by single molecule experiments ll24lll3llll32ll with lag times measured 
by bulk experiments. 
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3. The slow approach to equilibrium 



To this point in this section we have made the simplifying assumption that there is a fixed critical nucleus size. However, 
Eq. [T3] shows that in general the critical nucleus size is a function of the free subunit concentration. As subunits are converted 
into capsids by the reaction, the free subunit concentration (pi) decreases and hence the nucleation barrier grows. In Fig. Illl Al 
nucleation barriers calculated using Eqs. |6]|7] and[T2]for a capsid with iV = 90 subunits are shown at three time points (i.e. three 
free subunit concentrations pi) for a reaction which begins with a supersaturated free subunit concentration pi{t = 0) = 3p* 
with p* the pseudocritical subunit concentration (section BlBb . As the reaction begins far out-of-equilibrium there is a relatively 
small critical nucleus size and correspondingly a small nucleation barrier However, as the reaction approaches equilibrium 
Pi = p* the size increases to a half-formed capsid and the barrier increases to SOk^T. Substitution of this free energy barrier 
into Eq[T7]shows that the reaction timescale far exceeds the experimentally accessible timescales at this point. In other words the 
reaction only approaches equilibrium asymptotically. As noted in section HTBI this effect can lead to underestimating subunit- 
subunit binding energies when finite-time assembly data is fit to equilibrium expressions. 

The observation that, at equilibrium, the critical nucleus corresponds to a half capsid rinuc = N/2 is rather generic. It 
results from the fact that equilibrium is reached when the free subunit concentration decreases to the point at which the chemical 
potential of a free subunit is equal to that of a subunit in a complete capsid, p^'' = cxp (/3G^''/Af). Thus, including additional 
complexities, such as subunit conformational changes or interface-dependent binding energies would not qualitatively change 
the result. We reiterate, though, that for the supersaturation conditions pi > p* from which productive assembly begins, the 
critical nucleus size will generally be much smaller than its equilibrium value. 



B. Rate equation models for capsid assembly 

Zlotnick and coworkers ll80[|286ll29lll developed an approach to simulate empty capsid assembly via a system of rate equations 
that describe the time evolution of concentrations of empty capsid intermediates. The idea is analogous to the classic kinetic 
rate equations for cluster concentrations in a system undergoing crystallization proposed by Becker and Doring ll25ll and then 
derived from the microscopic dynamics of the lattice gas (or Ising) model by Binder and Stauffer i34| |. In contrast to the 
models for crystallization, however, the capsids terminate at a finite size. The initial works of Zlotnick and coworkers used the 
simplifications that there is one species of intermediate for each size n, and that only single subunits can bind or unbind in each 
step, which gives the following equations 

dpi ^ 

-J^ = -flSlpl + + ^ -fnSnPnPl + KSnPn 

n=2 

-TT = In~lSn-lPlPn-l - fnSnPlPn n = 2. ..N 

dt 

(20) 

where pn is the concentration of intermediates with n subunits, /„ and 5„ are respectively association and dissociation rate 
constants for intermediate n, and s„ and s„ respectively describe the degeneracy for binding and unbinding ifsoll . While Eq. |20] 
resembles a Master equation, notice that the factors of pi in the association reactions introduce nonlinearities which complicate 
its solution. Therefore we will refer to the equations in this section as 'rate equations'. 

The association and dissociation rate constants are related by detailed balance as 6„ = fn-^i cxp [(G"'' — G"^]^)/A:bT] /vq, 
with Gj^i'''' the interaction free energy of a partial capsid with n subunits and vq the standard state volume. Specifying the assembly 
model requires defining the set intermediate geometries and their free energies (e.g. see section HTbI Eqs. (|3]l or (|6)) and the 
association rates /„. 

Despite the extreme simplifications leading to Eq. |20] rate equations of this form have shown good agreement with many 
features of experimental assembly kinetics data, including the assemb ly ki netics of HBV l29lll (Fig. |2j3), the assembly of 
CCMV into diff erent polymorphs depending on subunit concentration il26r . the short time kinetics of BMV assembly ll52ll . 
SV40 assembly ll35ll . the impact of RNA on MS2 assembly 111 87(1 . and the assembly of HIV capsid protein into tubes l250ll 
(Fig. [UK). 

The assumption of only a single structure per intermediate size can be relaxed at the cost of increased computational complex- 
ity. For example Zlotnick and coworkers [79, 184] have enumerated the space of all possible well-forme d clus ter configurations 
for two geometries, and catalogued t he ensemb le of pathways surpassing a threshold value of probability lll84ll . In an alternative 
approach, Schwartz and coworkers ||243[ l283ll have used continuous time Monte Carlo (known as the Bortz-Kalos-Lebowitz 
ll36ll or Gillespie ll96ll algorithm) to stochastically sample pathways cons istent with kinetic rate equations. They have particu- 
larly considere d the effec ts of binding between oligomers Il243[|277ll28l and an optimization routine to fit parameters to light 
scattering data IMIlTTlKseeFig.IIB. 
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FIG. 11. (A) In vitro assembly of HIV CA protein into tubes monitored by absorbance (red diamonds, with thick grey lines indicating error 
bars) at indicated subunit concentrations compared to best fits using a rate equation model (black lines). (B) Light scattering for HPV LPl 
assembly from Casini et al. (light grey diamonds) compared to a continuous time Monte Carlo trajectory using parameters optimized to 
the data (solid black line). The dashed curve corresponds to a trajectoiy with parameter values reduced by 2.5 x lO'' from their optimal values 
and negative values truncated to zero, to simulate a threshold level of signal to background scattering. (A) is reprinted with permission from 
Biochemistry, 51, 4416-4428 (2012), A Trimer of Dimers Is the Basic Building Block for Human Immunodeficiency Virus-I Capsid Assembly, 
Tsiang, Niedziela-Majka, Hung, Jin, Hu, Yant, Samuel, Liu, Sakowicz, Copyright (2012) American Chemical Society. (B) is reprinted with 
permission from Phys. Biol., 7, 045005 (2010), Kumar and Schwartz, A parameter estimation technique for stochastic self-assembly systems 
and its application to human papillomavirus self-assembly. Copyright (2010) lOP Publishing. 



Several groups have also developed continuum-level descriptions of assembly dynamics which allow for analytical treatment 
lll86ll26l]l . Van der Schoot and Zandi applied the classical t heory of spinodal decomposition (model A dynamics) to examine 
the late-stage relaxation of assembly dynamics, while Ref. Il86ll is discussed in section IIIBI However, the important role of 
nucleation in the kinetics has not yet been incorporated into these treatments. 

Limitations and advantages of the rate equation approach. The key advantage of state-based over particle-based ap- 
proaches discussed next is that the former do not track diffusive motions of individual subunits and thus can access larger 
system-sizes and timescales. However, even extended state-based approaches require pre-definition of the accessible state space 
(i.e. the structures of intermediates for each size n) and the transition rates between them. To date these methods have not been 
used to address the possibility of strained interactions between subunits which deviate from the ground state of the pairwise 
interaction potential. The possibility of strained interactions would greatly expand the set of possible cluster configurations, hin- 
dering predefinition of the state-space. Secondly, the state-based approaches used to date assume a uniform spatial distribution 
of free subunits (a mean-field approximation), neglecting any particle-particle correlations or rebinding kinetics. However, there 
is no evidence from particle-based simulations or experiments that this approximation leads to significant error. 
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FIG. 12. Examples of two classes of models for icosahedral shells. (A) A patchy-s phere model with the pentavalent subunit interaction 
geometiy of a T=l capsid (see Fig. [Tp), but spherically symmetric excluded-volume lIlOSll . In the top image, two interacting subunits are 
shown, with numbered arrows indicating the locations of the 5 distinct attractive patches. The lower image shows an assembled capsid, with 
patches colored green. (B) An extended subunit representation of a T=l capsid. In the top image, the large cyan spheres experience repulsive 
excluded-volume interactions while small yellow spheres on complementary faces experience attractive interactions. The lower image shows a 
complete capsid, with subunits reduced in size for visibility and the locations of attractive patches indicated by green cylinders. The images in 
(B) are reprinted with permission from Rapaport, Phys. Rev. E, 70, 05 1905 (2004), Self-assembly of polyhedral shells: A molecular dynamics 
study. Copyright (2004) by the American Physical Society. 



C. Particle-based simulations of capsid assembly dynamics 



In this section we consider simulations of capsids or other polyhedral shells that explicitly track the positions and orientations 
of each subunit. Thus, once the model has been defined no further assumptions about pathways or the state space are required. 
Capsid proteins typically have several hundred amino acids and assemble on time scales of seconds to hours. Thus, simulating 
the spontaneous assembly of even the smallest icosahedral capsid with 60 proteins at atomic resolution would entail an extreme 
computational demand II89I1 . However, it has been show n tha t the capsid proteins of many viruses adopt folds with similar 
excluded-volume shapes, often represented as trapezoids |1l73ll. Several groups have therefore developed models for subunits 
which, although highly simplified, retain the most important features. Namely, they have an excluded-volume geo metry and 
orie r itation-dependent a t tract i ons designed such that t he lowest energy structure is shell with icosahedral symmetry l76lll05l 
flM fill. fl2i [173. flMJl90ll20^ 

The coarse-grained particle-based simulation models can be roughly separated into three classes. We will use the term 'patchy- 
sphere models' to refer to models in which the subunit has spherically symmetric excluded-volume and patches with short-ranged 
attractions arranged such that the lowest energy configuration corresponds to a particular target structure (see Fig. IT2t\). Patchy- 
sphere models are quite general and have long been used to represent decorated colloids (e.g. [33]) as well as proteins (e.g. 
|[l6l[l ). The pa tch-patch potential can include angular and dihedral terms to control the overall directional specificity of the 
attraction llOSl 227l 272ll and patches with different interaction length scales to control preferred face angles of assembling 
polyhedrons ll28l[265|]. 

The second class of models, first developed by Rapaport i209||, considers an extended subunit comprised of spherically 
symmetric 'pseudoatoms' arranged to have short-ranged attractions and excluded- volume geomet ries that mimic features of 
protein geometries seen in capsid structures (Fig. [T2B). For example, several groups have considered models in 

which subunits have a trapezoidal shape which is roughly consistent with that of capsid proteins with the beta-barrel architecture 
ll219|l or models in which 20 triangular subunits (which could coiTespond to protein tiimers) form icosahedral shells [76, 106l 
isl l209l |210[| . Extended subunits have also been used to model nanoparticles with a variety of shapes (e.g. lf284ll ). In a 



third class of models, subunits have polygonal interaction directio ns, but rath er than tracking their diffusion, they are irreversibly 
placed onto growing capsids in energy-minimized configurations ll 12l[l57ll . 

The early history of particle-based caps id a ssembly simulations. The first dynamical simulations of capsid assembly 
were performed by Schwartz and co-workers [[2271 . who considered a patchy-sphere type model with complementary attractive 
interactions directed such that lowest energy configurations corresponded to 60-subunit T=l closed shells. Their exploratory 
simulations using dissipative molecular dynamics identified the importance of annealing during assembly, as uncorrected as- 
sembly errors tended to lead to malformed structures. Rapaport considered models for icosahedral shells in which subunits have 
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triangular [[209|] or trapezoidal II210II excluded-volume geometries. The early simulations II210II included dynamically unrealistic 
rules which limited the number of nucleation sites, but suggested that a simple interaction potential could direct assembly of 
well-formed capsids and that subunit association rates do not decrease dramatically as capsids near completion; 

The first statistical estimates of assembly into icosahedral shells were obtained by Hagan and Chandler lIlOSll using over- 
damped Brownian dynamics with several patchy-sphere models for T=l shells. They constructed a 'kinetic phase diagram' 
showing the dominant assembly products as a function of subunit concentration, subunit-subunit interaction strength, and the 
orientational specificity of subunit interactions (see below). They also found that assembled capsids w ere hi ghly metastable in 
infinite dilution and disasse mbly showed significant hysteresis, as seen in experim ents on HBV ca psids ll234ll . 

Nguyen and co-workers III88II used discontinuous molecular dynamics yl l26l l206l 12071 l235[l to simulate the assembly of 
models in which subunits had short-range attractive interactions and excluded-volume geometry shapes of triangles or kites. 
In contrast to other models in which subunits are rigid bodies, the pseudo-atoms comprising each subunit were connected by 
infinite hardwall potentials and thus the subunits had some internal degrees of freedom. They predicted a phase diagram with 
many features in common with those of other models, except that incorporation of the last subunit was hindered by the internal 
degrees of freedom of the nearl y complete part ial capsids (see below). 

Doye, Louis, and coworkers lll28l |272| - |274|| used Monte Carlo simulations to study the dynamics and thermodynamics of a 
variety of patchy-sphere models, with ground state geometries that include the set of regular polyhedra, T=l shells, and r=3 
shells. For interaction potentials which did not incorporate dihedral angles (motivated by patch y coll oids), they found that 
assembly could proceed through a disordered liquid state intermediate for s ome parameter ranges [l273l and competition with 
disordered states led to a dodecahedra being kinetically inaccessible i274ll . The liquid state disappeared when the dihedral 
potentials consistent with protein-protein interactions were included. 

Hicks and Henley ll 12ll proposed a model for assembly of HIV capsids in which triangular subunits were irreversibly attached 
to sites on a growing cluster according to the local geometry to form hexagonal or pentagonal substructures, with a subunit- 
subunit interaction geometry that defined a preferred spontaneous curvature. They found that under irreversible attachment the 
model produced an ensemble of ir regul ar structures, but not the conical shells observed in EM images of mature HIV capsids 
ll29il91ll . Levandovsky and Zandi ll57ll extended and modified the model to allow merging and for the structure to minimize 
its elastic energy at each step. The model predicted an ensemble of structures which closely resemble those seen in retrovirus 
capsids, and suggested that the protein spontaneous curvature plays a key role in determining the capsid shape (i.e. spherical, 
conical, etc.). 

Higher resolution representations of capsid proteins have been used to simulate parts of assembly pathways. For example 
Chen and Tyco Ii50i1 developed a model for the HIV capsid protein which includes information about subun it-sub unit contacts 
derived from NMR studies, and simulated the early stages of assembly in two dimensions. Tunbridge et al. i252ll modeled the 
assembly of small oligomers of HBV proteins with the proteins modeled as rigid bodies using a transferable one-bead-per-amino 
acid model lll40ll . Futhermore systematic coarse graining from atomistic simulations was used to estimate subunit positions and 
orientations from cryo electron microscopy images of the immature HIV capsid ifisll . A study of dimerization of the C-terminal 
domain of the HIV capsid protein at atomic resolution with explicit water showed that water in the vicinity of the protein-protein 
interface sits at the edge of a drying transition ll278ll . 



D. Conclusions from assembly dynamics models 

Having presented an overview of some of the theoretical and computational models of capsid assembly dynamics, we highlight 
a few of the more important conclusions that have emerged these studies. We will see that many of the predictions are consistent 
across both rate equation and particle-based models and match experimental observations. 

Capsid assembly kinetics are sigmoidal. Consistent with the experimental measurements, the theoretical and computational 
models predict sigmoidal assembly kinetics. Example predictions are shown for rate equation models in Figs. |2j3 andlTTIfor 
Brownian dynamics simulations in Fig. [ToK and for molecular dynamics simulations in Fig. [13] In all cases, there is an 
initial lag phase during which capsid intermediates form, followed rapid capsid production and then an asymptotic approach to 
equilibrium. 

Intermediates do not build up. Fig. [13] shows the fraction of subunits in intermediates of all sizes as a function of time for 
a model icosahedron IZQSi for three values of the binding energy. For the two smaller values which lead to productive assembly 
there is never a significant fraction of subunits found in intermediates. A similar result is found for both rate equation models 
and other computational models, consistent with experiments where intermediates are generally not detectable for productive 
parameters. 

The duration of the lag phase is set by the mean capsid assembly time. This observation was described in section HTl A 21 
(Fig. [Hi). 

Optimal assembly occurs when subunit-subunit association is reversible. Dynamical simulations predict that capsid yields 
at finite observation times are nonmonotonic with respect to control parameter values (e.g. subunit concentration, binding 
energy, or specificity), which is consistent with experimental observations ll47[|289[|29lll . The plots showing yield as a function 
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FIG. 13. The time evolution of cluster size distributions are shown for three interaction strengths, parameterized by e, for molecular dynamics 
simulations of the triangular subunit model shown in Fig. [T2b. The model capsid comprises 20 subunits; the system has entered a kinetic 
trap at the highest interaction strength. Figure reprinted with permission from Phys. Biol., 7, 045001 (2010), Rapaport, Modeling capsid 
self-assembly: design and analysis, Copyright (2010) lOP Publishing. 



of time in Figs. l2b. [ToK. and [13] demonstrate suppressed capsid production at the highest subunit concentration or binding 
energy. Similarly, examples for which dynamical simulation results at long times have been presented in cross-sections of phase 
diagrams are shown in Figs. [14] [15] and [16] LI 941 . 

It is worth noting that the subunit-subunit binding free energy (^b in ©) cannot be directly determined from force field 
parameters in the simulations referenced in Figs. [14] [15] and [16] as the binding entropy penalty depends on the length scale 
and directional specificity of the interaction. Binding free energies estimated from umbrella sampling (e.g. [105, 106, 274J) 
for optimal parameter values were of order 5 — lOk^T depending on particle concentrations, consistent with experimental 
observations (see Fig. l6] ||47ll ). 

The phase behavior can be separated into five regimes, whose locations are indicated on the phase diagram shown in Fig. [14] 

1 . No assembly at equilibrium. In this regime the interactions driving assembly are too weak to overcome the rotational 
and mixing entropy of free subunits, and virtually all subunits are present as free dimer in equilibrium. As discussed in 
section HTBI this regime corresponds to subunit concentrations below a critical value p < p*, whose value depends on the 
subunit-subunit binding free energy; the values of p* are shown as dashed and white lines in Figs.[T4land[T6lrespectivelv. 

2. No assembly on relevant time scales due to a nucleation barrier For concentrations sufficiently close to the critical value, 
p > p*, nucleation barriers are prohibitive (see section HII A 3fa nd Fig. IIII Ab . Thus assembly is not seen on timescales 
that are accessible to simulations (or experiments) at these concentrations. This is the first kinetic effect that can prevent 
assembly at long but finite times. This regime is seen in the phase diagrams shown in Figs. [T4| and [T6] where there is a 
region between p* and parameter values at which assembly is observed. 

3. Productive assembly. For moderate parameter values initial nucleation barriers are small enough such that finite-time 
assembly yields can be quite large, with > 90%. 

4. Free subunit starvation kinetic trap. The first form of kinetic trap described in section UlI A 1 [ arises due to the constraint 
of mass conservation. When nucleation is fast compared to elongation (p^t in Eq. \T% . too many capsids nucleate at early 
times, and the pool of free subunits or small intermediates becomes depleted before a significant number of capsids are 
completed. This phenomenon can be seen readily in the time series shown in the right panel of Fig. [T3] Except to the extent 
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FIG. 14. Assembly products at long times for 60-subunit patchy- sphere model T=l shells as functions of binding energy Eh, angular specificity, 
8m, and total particle concentration px- Solid squares indicate parameter sets for which there were significant yields of well-formed capsids 
/c > 0.3, while open squares indicate poor assembly, /c < 0.3. The dashed line indicates the parameter values above which significant capsid 
assembly shou ld oc cur at equilibrium. The location of the five regimes discussed in the text are shown on the phase diagram on the left. Figure 
based on Ref. flMl. 




FIG. 15. Assembly products at long times for a 20-subunit extended subunit T=l shell as a function of temperature (i.e. inverse of interaction 
strength) and particle concentration. Representative structures are shown for the well-formed and mis-assembled regions. Figure adapted 
with permission from Nano Lett., 7, 338-344 (2007), Deciphering the kinetic mechanism of spontaneous self-assembly of icosahedral capsids, 
Nguyen, Reddy, and Brooks, Copyright (2007) American Chemical Society. 



that the remaining partial capsids have geometries which allow for direct binding, further assembly requires the analog of 
Ostwald ripening, in which subunits unbind from smaller partial capsids and are scavenged by larger intermediates. This is 
an activated process since it requires bond-breaking; hence it is generally slow in comparison to time scales for assembly 
at parameter sets that do not lead to trapping. This form of kinetic trap was first predicted with rate equations models by 
Zlotnick and coworkers ll80ll29lll . and was shown to be consistent with experiments (e.g. the largest salt concentration in 

Fig.EK). 

5. Malformed capsids. The second form of kinetic trap arises when subunits forming strained bonds that deviate from the 
ground state of the interaction potential are trapped into growing clusters by subsequent subunit additions. For example, 
in T=l capsids it is common to observe hexameric defects at the five-fold vertices. In some cases defects lead to closed 
shells that lack icosahedral symmetry, whereas in other cases the curvature is disrupted significantly enough that spiral 
structures form. Nguyen and coworkers [|l90l catalogued sets of hexameric defects that lead to closed or open structures 
(see Fig. 

The predictions and observations that capsid assembly yields are nonmonotonic with respect to driving forces have contributed 
to a wider understanding that many exar nples of self-assembly are most efficient when structures are stabilized by numerous 
relatively weak interactions UtI III HI [lO^ [lol [IM [HI llM lUll EH Elll H?! E?! EH. While strong interparticle 
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FIG. 16. Assembly products at long times for a 12-subunit patchy-sphere model icosahedron. The fraction of subunits in target clusters is 
shown as a function of the patch width cr (measured in radians) and reduced temperature. The inset shows the equivalent plot for a system with 
the same parameters except without dihedral terms in the interaction potential. The image at the top right shows the target structure, while 
the lower images show regions of the system for simulation at the indicated parameter values. The white lines show the temperature for the 
equilibrium transition from assembled clusters to a gas of monomers calculated from umbrella sampling. Figure and images reprinted with 
permission from J. Chem. Phys., 131, 175102 (2009), Monodisperse self-assembly in a model with protein-like interactions, Wilber, Doye, 
Louis, and Lewis, Copyright(2009) American Institute of Physics. 



bonds stabilize the ordered equilibrium state, they also promote and stabilize the two kinetic traps described above that frustrate 
assembly. Thus, effective self-assembly proceeds by relatively transient bond formation, with bond-breaking events that are 
near ly as frequent as bond formations. This idea wa s firs t suggested in the context of virus assembly by Zlotnick and coworkers 
ll47i l29l[l ( Fig. |6|l and by Schwartz and coworkers ll227ll based on preliminary particle-based simu lations. The extent t o which 
capsid assembly reactions approach reversibility has been monitored by a variety of metrics (e.g. lll06[ Il24[ |208[ |21 ill ) which 
include measuring relative frequencies of bond formation and bond-breaking jl24, 208, 2110, fluctuation dissipation ratios lll24ll . 
and the extent to which clusters of similar size are in relative Boltzmann equilibrium Il06ll . It has been shown that the extent 
to which reversibility is violated can be correlated t o yiel ds of well formed capsids. Similar approaches have been applied to 
models for crystallization ( e.g. ll99l [lOO, 106, 124 Il43ll ). Despite the fact that viral capsids are monodisperse closed shells 
whereas crystals are extended structures, the correlations between reversibility and assembly yields in models of crystallization 
are strikingly similar to those observed for models of capsid assembly. 

Given the significant amount of attention which has been accorded to reversibility, it is important to note that the strong inter- 
actions (large degree of supersaturation) which lead to violations of reversibility contribute to assembly failure through both of 
the kinetic traps described above. In the case of the free subunit starvation trap, the Boltzmann factor in Eq. [T9]identifies that 
nucleation rates increase much more quickly than elongation rates and there is a threshold binding energy or subunit supersatu- 
ration above which the condition will be violated. Furthermore, once free subunit starvation sets in, the Ostwald ripening which 
eventually leads to equilibration requires bond-breaking and hence is characterized by a timescale which increases exponentially 
with binding energy. 

The malformed capsid trap arises when subunit addition to a growing partial capsid occurs more quickly than already as- 
sociated subunits can anneal defective or strained interactions. The rate of subunit addition is proportional to free subunit 
concentration. Annealing in general requires some bond-breaking and hence is characterized by a timescale which increases 
exponentially with binding energy. 

While both the free subunit starvation trap and malformed capsids arise for strong interactions or high subunit concentrations, 
which effect dominates at a given parameter set is a strong function of the directional specificity. Highly specific binding 
interactions imply that imperfect subunit interactions are unstable and only very strong binding energies lead to malformed 
capsids. Therefore, for interactions with sufficiently high directional specificity, the threshold binding energy for malformed 
capsids is well above the threshold binding energy for the free subunit starvation trap (Eq. [19}. From an evolutionary standpoint, 
or from the perspective of designing synthetic or biomimetic assembly systems, it might appear that maximizing directional 
specificity would be optimal for productive assembly. However, inc reasing dir ectional specificity reduces the subunit kinetic 
cross-section and thus leads to lower subunit-subunit binding rates ll05l l268"ll . There is a trade-off between selectivity and 
kinetic accessibility which limits the optimal degree of directional specificity for finite-time assembly reactions. In the case of 
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FIG. 17. Population distribution of st ructures obtained at long times for near optimal parameters using discontinuous molecular dynamics for 
a T=l model by Nguyen et al. lll90ll . The structures were defined by Nguyen et al. I190II as (A) complete icosahedral capsids, (B) oblate 
capsules, (C) angular capsules, (D) twisted capsules, (E) tubular capsules, (F) prolate capsules, (G) conical capsules, (H) partial capsids, and 
(I) open mis-aggregates. Figure reprinted with permission from J. Am. Chem. Soc. 131, 2606-14 (2009), Invariant polymorphism in virus 
capsid assembly, Nguyen, Reddy, and Brooks, Copyright(2009) by the American Chemical Society. 



capsid proteins the degree of directional specificity which is physically realizable is limited by the nanometer length scale of 
the underlying hydrophobic and electrostatic interactions. The degree o f spe cificity used within coarse-grained models can be 
roughly esti mated based on the physical interaction length scales (e.g. II88II ) or it can be calculated systematically via coarse 
graining id [HI. 

Ex perimental observations of partial or malformed capsids. While light scattering experiments on HB V ll29l|] and CCMV 
II289II clearly indicate that stronger than optimal interactions lead to reduced assembly yields, these results were interpreted in 
the context of the monomer starvation trap. In general products of reactions performed at stronger than optimal parameters have 
not been well characterized by imaging, in part because variable sizes and defective structures limit the use of multi-particle 
and/or icosahedral averaging, but also because effort has been focused on parameter sets which produce high yields of capsids. 
However in a notable early in vitro study Sorger et al. i237|l identi fied m alformed turnip crinkle mosaic virus capsids assembling 
around the genomic RNA. Furthermore, Teschke and coworkers lll97l[ have catalogued a series of malformed structures which 
result due to mutations in the P22 capsid protein, and Stray et al. ll240ll showed that HBV assembly is accelerated and defective 
in the presence of an antiviral drug. 

The role of solvent. The solvent plays several important roles in capsid assembly. As described in section III Al subunit- 
subunit association is typically driven by hydrophobic interactions with moderate ionic strengths required to screen electrostatic 
interactions. These solvent-mediated effects have been incorporated implicitly in particle-based capsid assembly simulations via 
the subunit-subunit interaction potentials. Second, the solvent absorbs kinetic energy released by the formation of low-energy 
subunit-subunit interactions. Third, random buffeting by the solvent helps annealing. The latter tw o effects have been incor- 
porated through a hydrodynamic drag and stochastic buff eting forc e in Brownian dynamics ^ d [lol [Tol, UtI or through 
explicit inclusion of inert solvent particles by Rapaport ||208[ |21 fll . Analysis of simulated trajectories from both approaches 
indicated that the random buffeting plays an important role in annealing by in ducing dissociation of improperly bound subunits. 
In contrast, simulations which used a thermostat but no implicit solvent II210II demonstrated much less efficient annealing during 
assembly because improperly bound subunits dissociated only upon collision with another subunit. 

A fourth effect of solvent is to introduce hydrodynamic interactions between subu nits. This effect is typically not accounted 
for within a Brownian dynamics simulation because of the computational expense ll224ll . Hydrodynamic coupling has been 
effectively included in as s emb ly simulations using a cluster move Monte Carlo |.267| and is intrinsically included in the explicit 
solvent simulations l208l |21 albeit at a significant computational expense. The primary effect of neglecting hydrodynamic 
interactions is that bonded clusters are 'freely draining', and thus experience a hydrodynamic drag that is proportional to the 
number of subunits rather than the cluster hydrodynamic radius. Hydrodynamic interactions also influence subunit collision 
rates, but a quantitative estimate of collision rates is likely to require atomic -resolution models in any case. To this point, no 
significant differences in assembly behavior between explicit solvent and Brownian dynamics simulations have been reported. 



E. Differences among models 

The models described in this section are highly simplified and thus designed to uncover generic assembly mechanisms for 
icosahedral shells. In that regard it is encouraging that results described above are consistent across most or all of the models. 
We now discuss some of the key differences between the models. 

We already noted in the previous section that state-based models have not been able to describe the malformed capsid trap. 
Among particle-based models, the structural details of ensembles of malformed structures predicted by dynamical simulations 
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are somewhat dependent on details of the model interaction geometries. E.g., trapezoidal s ubun i ts ca n form defects not available 
to triangular subunits and thus do not give rise to the same ensemble monster structures lll88[ Il9dll . Similarly, we have found 
that patchy sphere models can form closed shells which are smaller than the gr ound state geometry under strong interactions, 
which does not seem to occur for triangular or trapezoidal subunits. Also, Ref. llOSll found that different subunit valencies can 
lead to different sets of assembly pathways even if the overall assembly efficiency and kinetics are similar 

Insertion of the last subunit. In all particle-based models insertion rates decr ease as the capsid nears completion due to steric 
hindrances. This effect appears to be amplified in the model of Nguyen et al. ll88lll9Cill . where insertion of the final subunit 
suppresses internal vibration modes of the partial capsid. Insertion of the final subunit was found to be rate limiting and even free 
energetically unfavorable for some parameter ranges. Experimental evidence supporting this observation has been l acking; fo r 
example mass spectrometry studies of HBV capsids indicated an overwhelming preponderance of complete capsids ll254tl256ll . 
It is worth noting though that most imaging studies of capsid structures use icosahedral and/or multiparticle averaging and 
thus would not detect missing subunits. Although experimetally challenging, a systematic search for defects, including missing 
subunits, under a variety of assembly conditions would be of great interest. 



F. Higher T numbers 

As described in section II Al icosahedral capsids with more than 60 proteins (T > 1) require precise arrangements of pen- 
tameric and hexameric capsomers, often with different protein conformations in the distinct but 'quasi-equivalent' sites. We 
first present studies that investigated the relative thermodynamic stabilities of possible capsid structures, followed by studies 
that attempted to discover how the correct arrangement of conformations (or of pentamers and hexamers) is achieved during the 
dynamics of shell construction. 



1. Structural stability of dijfereiit capsid geometries 

T he re lative thermodynamic stabilities of shell geometries have been investigated by Monte Carlo simulations. Zandi et al. 
ll40ll279ll studied shells comprised of two species of discs, representing pentamers and hexamers confined to a spherical surface, 
and found that the Caspar and Klug geometries correspond to minimum free energy configurations for appropriate size-ratios of 
the discs. Chen, Zhang, and Glotzer lf54ll studied the assembly thermodynamics of cone-shaped particles. For decreasing cone 
angles the particles assembled into convex shells of increasing sizes corresponding to 'magic numbers'. Certain magic numbers 
corresponded to icosahedral shells, and the assembled structures were found to correspond to equilibrium structures of colloids 
confined to spherical surfaces. Similar approaches have been used to investigate the relative stabilities of potential prolate 
structures. Chen et al. llssll performed Monte Carlo simulations of spheres pack ing on prolate surfaces and found structures 
consistent with some prolate virus capsids, while Luque, Zandi and Reguera il70ll found that free-energy minimization of disks 
representing pentamers and hexamers confined to a prolate surface led to structures consistent with a number of prolate or 
bacilliform capsid structures and identified selection rules for the length, structure, and number of capsomers for prolate capsids. 

Fejer, Chakrabarti, and Wales ||82| studied the formation of shells by disk-shaped or ellipsoidal subunits with anisotropic 
interactions that dictated a preferred curvature, via searching for minima in the configurational energy landscape. For disk- 
shaped subunits the results resembled those of the assembling cones lf54ll with global minima corresponding to icosahedral 
structures for particular preferred curvatures. For appropriate arrangements of anisotropic interactions, they were also able to 
describe tubular, helical and head-tail morphologies (i.e. resembling tailed bacteriophages), as well as multi-shelled structures. 

Continuum elasticity theory has also shed light on capsid shapes. The fact that small icosahedral capsids tend to be spherical 
while larger capsids tend to look more facete d (i.e . icos ahedral) iITtIi was reproduced by continuum elastic models in which the 
facet ing corresponds to a buckling transition lll59[|229]| . A similar approach was applied to spherocylindrical and conical shells 
II191II and reproduced features of retrovirus capsid shapes lll92ll . 



2. Dynamics of forming icosahedral geometries 

Berger et al. proposed a system of 'local rules' llsill in which subunit-subunit binding interactions are highly conformation 
specific. I.e., subunits with a particular conformation. A, bind strongly to A-sites on a growing capsid (binding sites for which the 
conformations of neighboring subunits favor the A conformation), but bind weakly or not at all to other sites. Simulations were 
performed using trivalent subunits, with different subunit conformations represented by differences in interaction geometries. 
Assembly dynamics was modeled by placing subunits one-at-a-time at binding sites on a growing capsid according to the local 
rules. It was found that icosahedral capsids could assemble with high fidelity, even under a certain degree of flexibility in subunit- 
subunit interaction angles. Relaxation of local rules led to the formation of malformed structures, such as spiraling shells ll226ll 
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and o ne of the first suggestions that it might be possible to develop antiviral agents that act by misdire cting capsid assembly 
II204II . The local rules are generalizable and were used to describe papovirus {polyomaviridae) assembly ll226ll . 

S everal of the molecu lar dynamics studies of capsi d asse mbly described above modeled assembly into r=3 or T=4 capsids 
ifTsi Il28l I190I I2O8I I2IOII and Nguyen and coworkers lll89ll modeled the assembly of preconstructed pentamers and hexamers 
into capsids as large as T=19. In most cases these studies used conformation-dependent subunit-subunit interaction energies 
corresponding to strict local rules, with each subunit locked into a particular conformation. They found that larger capsids 
can assemble under conditions similar to those which lead to well-formed T=\ capsids and are constrained by similar kinetic 
traps. However, in general parameters need to be more tightly tuned the larger the capsid is, as there are more opportunities 
for misdirection of the assembly pathway and elongation times are longer (hence requiring longer nucleation times according to 
Eq. [T9T l. As noted above, the ensemble of malformed structures that results for non-op tima l parameters is somewhat dependent 
on subunit interaction geometries, and thus depends on the preferred T-number. Ref. lll90ll also found that details of assembly 
mechanisms could differ between T=l and T=3 capsids. 

Breaking the local rules. There are two experimental observations which seem inconsistent with the assumption of strict 
local rules, at least for small icosahedral viruses. F irst, f or many viruses the structures of binding interfaces are quite similar for 
subunits in different conformations (see e.g. Ref. ll245ll ). Second, capsid proteins which assemble into a particu lar ic osahedral 
structure with high fidelity can adapt to form different icosahedral morphologies under different conditions 111 [111 Il53ll . Further- 
more, as described further in section |IV] capsids exhibit poly morphism when assembling around cargoes with incommensurate 
sizes. For example, Dragnea and coworkers lissl I68ll70l E4l1l demonstrated that brome mosaic virus (BMV) proteins assemble 
into T=l, pseudo-T2, and T=3 capsids around charge-functionalized nanoparticles with different diameters. These observations 
of polymorphism in empty and cargo-containing capsids raise the questions of how strongly subunit-subunit interactions across 
a given interface can depend on conformation (i.e., how strict are the local rules), and how strong of a conformation-dependence 
is required to assemble into icosahedral structures. 

Elrad and Hagan simulated assembly of T=3 empty capsids with a model in which the conformation-dependence of the 
subunit-subunit interaction energies was systematically varied. They found that T=3 capsids could form with high fidelity pro- 
vided that interactions which violate the conformation-dependence of the T=3 structure were 20% weaker than those consistent 
with the target geometry. If the conformation dependence of the interactions was less specific, asymmetric closed shells or open 
spiraling structures were the dominant assembly morphologies. Interestingly, since the binding free energy at which capsids 
assembled successfully is only on the order of 5 — IO/cbT, a 20 % reduction in free energy for subunits with the wrong set of 
conformations corresponds to a free energy difference only on the order of k^T, which could easily arise from minor variations 
in binding interfaces. It was found that this level of conformation-dependence did allow for adaptable assembly into alternative 
icosahedral geometries around nanoparticles with different sizes ifTSll . 

The difficulty of dynamically constructing icosahedral shells in the absence of conformation-specific interactions is elucidated 
by the recent study of Luque et al. Il69ll . which examined the dynamics of monolayers of spheres constrained to growing on a 
spherical ma nifold . Recall that a series of 'magic number' equilibrium structures were identified for this system in Ref. |[54ll . 
Luque et al. Il69ll found that line tension of the growing shell drives premature closure of the shell thus hindering formation 
of defect-free structures. Other mechanisms promoting disorder became important for structures comprised of more than 50 
subunits. Interestingly, small defect-free shells could be assembled by adjusting the radius of the manifold. The important roles 
that template can play in directing assembly will be discussed in section [TVl 

Recent studies indicate that RNA-protein intera ctions play a key role in determining assembly pathways for the ssRNA bacte- 
riophage MS2 ||2l|7lS|73[il3|2ll|23l,|24i. Stockley and coworkers used mass spectrometry to show that conformational 
switching of the MS2 capsid protein can be regulated by binding of short RNA step-loops, with sequence-dependent activity. 
Dykeman and Twarock performed all-atom normal modes calculations on the capsid protein in the presence and absence of 
RNA which identified a potential allosteric connection between the RNA binding site and the flexible FG loop which undergoes 
the bulk of the conformational change [[73t l. Coarse-grained computational modeling indicated that RNA binding influences 
subunit-subunit association rates and a conformation-dependent manner IttIi . Furthermore, mass spectrometry identified two 
intermediates in the assembly pathway |[23^ as well as their concentrations during assembly. This information was used to build 
and fit parameters for a Master equation model of assembly lll87ll . The results indicated that there are two dominant assembly 
pathways, whose prevalence is determined by the stoichiometric ratios of RNA and capsid protein. 



IV. CARGO-CONTAINING CAPSIDS 
A. Structures 

In this section we consider capsid assembly around RNA or other types of cargo. The structures of numerous virus capsids 
assembled around single-stranded nucleic acids hav e been revealed to atomic resolution by x-ray crys tallography and/or cryo- 
electron microscopy (cryo-EM) images (e.g.illli [ml UItI [Hi HM HH HM [HI [2491 [257ll259ll ). The packaged nucleic 
acids are less ordered than their protein containers and hence have been more difficult to characterize. However cryo-EM 
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experiments have identified tha t the nucle otide densities are nonuniform, with a peak near the inner capsid surface and relatively 
low densities in the interiorll6ll l246[l290ll . For some viru ses striking image reconstructions show that the packaged RNA adopts 
the symmetry of its protein capsid (e.g. ll225l l246i l249ll V While atomistic detail has not been possible in these experiments, 
all-atom models have been derived from equilibrium simulations ll67l[89ll . 



B. The thermodynamics of core-controlled assembly 

In this section we extend the thermodynami c ana lysis of secti on III B I to include interactions with an attractive core, following 
the calculations of Zandi and van der Schoot l28lll and Hagan ll04ll . We consider a dilute solution of capsid protein subunits 
with total density px, and cores (i.e. polymers or nanoparticles) with density xj. We define a stoichiometric ratio as the ratio of 
available cores to the maximum number of capsids which can be assembled, r = Nxj/pj, with N the number of subunits in a 
complete capsid. Subunits can associate to form capsid intermediates in bulk solution or on core surfaces. 

Following section HTb] we minimize the total free energy under the constraints that the total subunit and core concentrations 
are fixed: 
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with pn and Xn respectively the concentrations of empty capsid intermediates with ?? subunits or cores complexed with n subunits 
(for simplicity we assume here that cores cannot form complexes with more than N subunits and that x n gives the concentration 
of well-formed capsids assembled around cores). We then arrive at two laws of mass action (compare to Eq. |5]l 

Pn^^0 = (Pl^'0)"cxp[-/?G-P] 

XnVo = vqXo (piVq)"' exp[-^G^™] (22) 

with xq the concentration of empty cores. Here C^^^ is the empty capsid free energy (Eq. |2) discussed in section IIIBI The 
quantity G^°"^ is the free energy for a core with n subunits which includes the core-subunit interactions. These are the crucial 
interactions that drive capsid proteins to assemble around the core, and models for this quantity are discussed next. 

As shown in section HTB] there is a threshold concentration p* = exp[/3G^''/ (A^ — 1)] below which essentially no empty capsid 
assembly occurs (Eq.fTOli. However, the core-subunit interactions can further stabilize a complete capsid so that G^^"' < G^'' and 
core-assisted assembly can occur at lower concentrations. This capability is exploited by many ssRNA viruses, whose capsids 
assemble only in the presence of RNA or other polyanions at physiological conditions. 

For core-controlled assembly we focus on two experimental observables, the fraction of subunits in capsids /c and the packag- 
ing efficiency, /p, meaning the fraction of cores contained in complete capsids. To simplify the analysis we assume cooperative 
association of capsid proteins to c ores, meaning that we neglect partially assembled intermediates. A full analysis including 
intermediates is performed in Ref. lll04ll . We first consider p < p* so no empty capsid assembly occurs and /c = r/p. The de- 
pendence of /c and /p on the system parameters can be understood from three asymptotic limits, depending on the stoichiometric 
ratio r, which were described by Zandi and van der Schoot [,281,1 . We then consider a fourth limit, p > p*. 

1. r <C 1 and p < p*. For sufficient excess subunit, the free subunit concentration can be treated as approximately constant. 
Using Eq.|22]and following the analysis leading to equation Eq.fTTIobtains 

/c = - ^7^44^ (23) 

1 + ip/p**r 

p**vo = cxp [/3G5^^7iV] , (24) 
where the threshold concentration p** is smaller than that for empty capsids p* due to the subunit-core interactions. 

2. r = 1 and p < p*. For exactly enough capsid protein to encapsidate every core the concentration of free cores is related 
to that of free capsid protein by xq ^ pi/N and we obtain the same form as for empty capsids 

«1 forpT<P** 

« 1 - ^ for pT > P** (25) 

pr 

but with p** as in Eq.l24l 
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3. r ^ 1 and p < p* . For excess core and cooperative subunit-core complexation (since we are neglecting partially covered 
cores) we obtain Eq. |25]but with 

p** = exp {PGT/N) {Nx^)-^/'' (26) 

where we have assumed iV 1. 

A. p > p*. The conditions above apply in the limit p < p* where empty capsids cannot assemble. For larger subunit 
concentrations empty capsids compete with assembly on cores. Due to the entropy cost of assembling a capsid on a core, 
Eq.|22]shows that there is a threshold surface free energy, G™'^q — G^'' ~ —k^T \og{pca^) for a stoichio metric amount of 
capsid protein and nanoparticles, above which full capsids are favored over empty capsids and free cores [ll04t . However, 
if nucleation of empty capsids ooccurs with a rate comparable to that on cores, complete or partially assembled empty 
capsids can assemble as a kinetic trap. 

Zandi and van der Schoot also investigated the effect of stoichiometric ratio on the size of the asse mbled cap sid. They found 
that for large r (i.e. excess genome) smaller capsids form, consistent with experimental observations ll232[|264l . 



C. Single-stranded RNA (ssRNA) encapsidation 

As noted in the introduction, most ssRNA capsids assemble around their genome during replication. This process is driven 
to a large extent by electrostatic interactions between positive charges on the capsid proteins and negative charges on the RNA 
mole cule. For example, the capsid proteins of many negative-stranded RNA viruses bind RNA via a positively-charged cleft 
II222II . For many positive-stranded RNA viruses, the capsid proteins bind RNA via a terminal tail rich in basic amino acids, 
which extends into the center of the virion and is typically unresolved in crystallographic mapsJsH HH |233[ l236t |238[ l248ll . 
These peptide tails are commonly referred to as arginine rich motifs (ARMs). 

A remarkable set of in vitro experiments indicates that just the negative charge found on the RNA phosphate groups is 
sufficient to drive ssRNA capsid assembly around a cargo. Namely, experiments have been performed in which capsi d proteins 
assemble into i cosa hedral capsids around various cargoes includ ing genomic or non - geno mic RNA (e.g. Lii-miilUlIiil 
[Tin [l4l[l49l 1261 synthetic p olyelectrolytes lEol B t I Issl l60i ll 14l ITTtI [l47[ [l7l l l23l . charge-functionalized nanoparticles 
,l97lll22lll63l 164l24lll25lll . DNA micelles 1 15 ill and nano-emulsions Eoll . However, as discussed further below features 



specific to biological RNA molecules such as their tertiary structure and sequence-specific interactions likely promote selective 
assembly around the viral genome or help direct assembly toward particular morphologies. 

Optimal genome length. The importance of nonspecific electrostatic interactions in driving RNA packaging has been pro- 
posed as a constraint on the length of viral genome. For instance, Belyi and Muthukumar |27tl and Hu and Shklovskii ll 19ll 
reported a striking correlation between the total number of positive charges in the tails and the length of the genomic RNA. 
Interestingly, most ssRNA viruses are overcharged, meaning that the charge on the encapsidated RNA exceeds that of the charge 
on the inner surface of the capsid, often by about a factor of two. In support of this proposal, experiments on various viruses 
showed that partial deletions of the positively-charged re sidues in A RMs lead to virions that packaged a reduced amount of viral 
RNAs as compared to the wild-type capsid proteins ll69lll34ll263ll . In further support of this proposal, a number of theoretical 
works have investigated the free energy to encapsidate a linear polyelectrolyte as a function of its length. The primary quantity 
of interest has been the thermodynamic optimal charge ratio, or the ratio between the negative charge on the polyelectrolyte and 
the positive charge on the capsid surface or peptide tails (ARMs) that minimizes the free energy. It has been proposed that the 
observed correlation reflects the functional dependence of the optimum charge ratio, thus suggesting a thermodynamic constraint 
on the co-evolution of genome length and capsid charge. We summarize the results of these calculations here. 

Hu and Shklovskii assumed that RNA wraps around peptide tails and found that the free energy is minimized when the RNA 
contour length is approximately equal to the total contour length of peptide tails, resulting in an optimal charge ratio of packaged 
nucleotides to capsid charges of 2 : 1. Belyi and Muthukumar |27:| treated the RNA as a polyelectrolyte and the peptide tails 
as oppositely charged brushes and used the ground state dominance approximation to predict an optimal charge ratio of 1 : 1. 
They noted that if the charge on the RNA and the peptide tails were renormalized according to counterion condensation theory 
lll76ll . the predicted ratio of packaged nucleotides to capsid charges would be 1.6 : 1. However, it is worth noting that condensed 
counterions are released by RNA-peptide association and thus factor into the encapsidation free energy. Our simulations with 
explicit ions and polymers with different linear charge densities (unpublished) suggest that the bare charge density should be 
used. Calculations that placed the capsid charge entirely at the surface predicted charge ratios of < 1 : 1 |;228:] and 2 : 1 Ii260ll . 
Siber et al. showed that the optimal charge ratio was less dependent on ionic strength when the capsid charge moved off of 
the inner surface, but still found an optimal charge ratio of < 1 : 1. The dependence of the encapsidation free energy on the 
length of a linear polyelectrolyte has also been investigated in the Umit of no added salt using Monte Carlo simulations on a 
model which assumes neither the continuum limit nor spherical symmetry |0]. These calculations also predicted that the optimal 
genome length would correspond to a charge ratio of 1 : 1. 
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Ting et al. 1124711 used a self consistent field theory to show that the optimal ratio of packaged charge to capsid charge depends 
sensitively on excluded volume and thus varies with many factors including capsid size and charge density on peptide arms. 
Their model predicted an optimal charge ratio of less than 1 : 1 for all relevant parameters. They noted however that the 
cytoplasm contains a significant concentration of negative charge in the form of large macromolecules which are excluded from 
the interior of an assembled capsid. They found that the Donnan potential due to this excluded charge is essential to obtain a 
thermodynamic optimum charge ratio which is larger than 1:1, and thus suggested that this effect plays an important role in 
capsid assembly in vivo. However, recent in vitro competition assays in which different RNAs compete for packaging under 
conditions of limiting capsid protein showed that longer RNAs (up to viral genome lengths) were preferentially packaged over 
shorter RNAs by CCMV capsid proteins ||59ll . This result suggests that the genome length is nearly optimal for packaging even in 
the absence of a Donnan potential. An interesting and somewhat surprising prediction of the self consistent field theory ll247ll is 
that the encapsidation free energy was essentially the same for a linear polyelectrolyte and for a polyelectrolyte with a branched 
structure reflecting the secondary and tertiary structure of an RNA molecule. 

Although the quantitative results of these theoretical calculations vary depending on their assumptions, they agree with exper- 
iments that the optimal genome length is an increasing function of the capsid cha rge. They cannot explain the observation that 
polymers with apparent charge ratios as large as 9:1 were encapsidated in Ref. Il2l[l : however, the predictions above are for 
the optimal length and it was not possible to measure the packaging efficiencies in those experiments. Models based on linear 
polyelectrolytes also do not captu re the sequence dependence of the charge ratios observed in recent in vivo experiments for 
mutant BMV capsid proteins f 1931] (discussed next). This discrepancy suggests that sequence-specific RNA-protein interactions 
which have not been accounted for in these calculations play a role in RNA packaging during assembly. 

Several sets of experiments indicate sequence-specific roles for the RNA and proteins. Recently, Ni and coworkers lll93ll 
performed an extensive investigation of virions assembled in A^. benthamiana plants from BMV proteins with positively-charged 
residues added, deleted, or substituted. They found a correlation between the amount of positive charge on the capsid and the 
amount of packaged RNA. However, relationships between mutations and the amounts and sequences of packaged RNA were 
sensitive to factors other than charge, indicating that that the charge on the capsid is not the sole factor in determining the 
amount or types of RNA which are packaged in vivo. In vitro experiments have also shown that the BMV capsid preferentially 
encapsidates RNAs containing a tRNA-like structure Wh- In cells, interactions between the capsid protein and this tRNA-like 
structure may play a role in packaging specificity by coordinating RNA replication and encapsidation I9h1 111. Packaging signals, 
or r egions of RNA that have sequence-specific interactions with the capsid protein, are known for some viruses (e.g. HIV 
illl [163, EH) or MS2 and satelHte tobacco necrosis virus (STNV^JH l4lll). As discussed in section HITfII the MS2 RNA 
regulates conformational switching in a sequence-dependent manner ll23ll . 

Genome organization. A number of studies have simulated bead-spring models of polyelectrolytes confined within simpli- 
fied representations of viral capsids. In some cases the capsid was modeled by a spherical container with different arra ngements 
of embedded charges f^Htj, while others were based on particular capsid structures. For example, Zhang et al. ll282ll incorpo- 
rated the electrostatic potential derived from the CCMV while the model of Forrey and Muthukumar [86] captured the charge 
distribution of the inner surface of the Pariacoto virus crystal structure. These studies showed that a dodecahedral arrangement 
of surface charges [HI or basic charges on peptide tails [86[ leads the encapsulated polymer to adopt a dodecahedral cage struc- 
ture. A model of capsid assembly dynamics ll76ll also found that the polymer adopted the symmetry of the overlying capsid 
charge. ElSawy et al. iFtsIi showed that a bead-spring polyelectrolyt e pla ced inside an atomic model of MS2 could reproduce 
the observed multiple peaks of the radial RNA density distribution ll249ll and the icosahedral order of the outer layer Their 
simulations suggested that RNA-RNA repulsion and the arrangements of crevices on the inner capsid surface were the dominant 
forces directing organization of the o uter l ayer of genome. Several models have used the detailed knowledge of RNA density 
ll72ll and its icosahedral order ll39l |72[ 1221 [ l239ll to suggest RNA-directed ass embl y pathways. 

Modeling assembly around nanoparticles. Hagan ll04ll and Zandi et al. l23lll used self consistent field theories to calculate 
the interaction b etween positive charges on capsid protein tails and carbo xyl gr oups functionalized on the surface of na noparticle s 
llsil l53i l63l I97I I122I I241I I251I1 . enabling prediction of time-dependent ll04ll or equilibrium pac kagin g efficiencies lll04l23l]l . 
Siber et al. 1 23 ill also considered assembly into different icosahedral morphologies. In Ref. lll04ll the kinetics of assembly 
around comers were also considered by estimating the free energy G™"^*^ for cores partially covered with intermediates and 
extending the rate equation approach described in section HIIBI to describe the kinetics of core-controlled assembly. Both the 
kinetic and equilibrium approaches predicted a threshold density of functionalized charge below which no significant assembly 
on cores would take place. This result was qualitatively confirmed by subsequent experiments ll63ll . Siber [231] et al also 
predicted that high charge functionalization densities and excess capsid protein would favor pseudo-r=2 capsids, which was 
seen experimentally by Daniel et al. ll63ll . 



D. Dynamics of assembly around cores 



Nanoparticles. As noted in the previous section Hagan il04ll used polyelectrolyte theory to calculate the electrostatic driving 
force for capsid proteins to adsorb onto nanoparticle surfaces as a function of the nanoparticle surface charge density. This 
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FIG. 18. The capsid mod el from Ref. f?^ . (A) The model subunit, as viewed from inside the capsid. The gray overlapping spheres interact via 
repulsive potentials ll266ll , complementary capsomer-capsomer attractors (green spheres at the subunit edges) experience attractive interactions 
and the capsomer-polymer attractors (blue spheres on the subunit inner surface) experience short-range attractions to polymer segments. Sphere 
sizes indicate the interaction length scale. (B) Image of a well-formed capsid assembled around a polymer (shown in red). (C) Visualization 
of the polymer density inside the capsid. The polymer density is averaged over a large number of successful assembly trajectories after 
completion, for a polymer with length A'p = 150 segments. Densities are averaged over the threefold symmetry of the capsomer, but not over 
the 20-fold symmetry group of the completed capsid. Images reprinted with permission from Phys. Biol., 7, 045003 (2010), Elrad and Hagan, 
Encapsulation of a polymer by an icosahedral virus. Copyright (2010) lOP Publishing. 




FIG. 19. The capsid model from Ref. Il72ll . (A) The model subunit. All beads experience repulsive excluded-volume interactions. In addition, 
pairs of white beads experience short-range attractive interactions and the pink beads have a charge of +e. Electrostatics interactions are 
represented by Debye Huckel interactions. (B). The low-energy capsid structure. Images reprinted with permission from J. Chem. Phys., 136, 
135I0I (2012), Langevin dynamics simulation of polymer-assisted virus-like assembly, Mahalik and Muthukumar, Copyright(2012) American 
Institute of Physics. 



dependence was used to extend the rate equation approach (section llll Bl) to predict assembly kinetics around attractive cores. The 
calculations found a threshold density of functionalized charge, above which capsids efficiently assemble around nanoparticles, 
and that light scattering increases rapidly at early times, without the lag phase characteristic of empty capsid assembly. These 
results were consistent with experimental measurements. 

Polymers. Elrad and Hagan iItsIi developed a coarse-grained computational model that describes the assembly dynamics of 
icosahedral capsids from subunits that interconvert between different conformations (section llll F 2b . The simulations identified 
mechanisms by which subunits form empty capsids with only one morphology, but adaptiv ely a ssemble into different icosahedral 
morphologies around nanoparticle cargoes with varying sizes, as seen in experiments ll24lll . Adaptive cargo encapsidation 
required moderate cargo-subunit interaction strengths; stronger interactions frustrated assembly by stabilizing intermediates 
with incommensurate curvature. 

Kivenson and Hagan Il4lll explored capsid assembly around a flexible polymer with a model defined on a cubic lattice using 
dynamic Monte Carlo, which allowed simulation of large capsid-like cuboidal shells over long time scales. By simulating 
assembly with a wide range of capsid sizes and polymer lengths, the simulations showed that there is an optimal polymer length 
which maximizes encapsulation yields at finite observation times. The optimal length scaled with the number of attractive sites 
on the capsid in the absence of attractive interactions between polymer segments. However, introducing attractive interactions 
between polymer segments, which physically could arise from base pairing or multivalent counterions, increased the predicted 
optimal length dramatically. A limitation of these simulations was that the Monte Carlo move set did not enable simultaneous 
motions of polymer segments and capsid subunits, and thus could not accurately describe the dynamics for parameter sets for 
which cooperative polymer-subunit motions play an important role in assembly. 

Elrad and Hagan j?^ performed Brownian dynamics simulations of encapsidation of a flexible polymer by a model capsid 
with icosahedral symmetry (Fig. [18). The model considered a truncated-pyr amidal extended subunit model of the form used 
to the assembly of empty T=l capsids from trimeric subunits lll06[ |209| - |21 ill (section llll Cl l. The polymer was represented by 
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FIG. 20. Kinetic phase diagram showing the dominant assembly product as a function of polymer length A'p and capsomer-polymer interaction 
strength Scp for capsomer-polymer interaction strength Ecc ~ A.OksT and subunit concentration log px = —7.38. The legend on the right shows 
snapshots from simulations that typify each dominant configuration. Figure reprinted with permission from Phys. Biol., 7, 045003 (2010), 
Elrad and Hagan, Encapsulation of a polymer by an icosahedral virus. Copyright (2010) lOP Publishing 



FIG. 21. Doublet virus-like particles assembled from CCMV capsid proteins around tobacco mosaic viras (TMV) RNA, with 6400 nucleotides 
or approximately twice the number of nucleotides packaged in a native CCMV virion Ii43,1 . Negative-stain transmission electron microscopy 
images are shown and the scale bar is 50 nm. Image provided by C. Knobler and W. Gelbart. 



a flexible bead-spring model, with repulsive interactions between segments corresponding to screened electrostatic repulsions, 
and short-range attractions to 'charge sites' located on one surface of the model subunits, corresponding to screened attractive 
electrostatic interactions between opposite charges on the polymer and capsid subunits. Use of the short-range interactions 
(which increased computational efficiency) was justified by the fact that capsid assembly in vivo or in vitro always occurs at 
moderate salt concentrations for which the Debye screening length is on the order of 1 nm. 

Subsequently Mahalik and Muthukumar ll72ll simulated the assembly of extended triangular subunits (Fig. [T9] l around a 
linear poly electrolyte with screened electrostatics modeled by Debye Huckel interactions. The simulations led to predictions 
which were qualitatively similar to those of Ref. ll76ll . but the Debye Huckel interactions do allow for a more straightforward 
connection to experimental salt concentrations. 

Assembly outcomes. Simulations were performed over wide ranges of polyme r lengths, subunit concentrations, subunit- 
polymer interaction strengths, and subunit-subunit interactions strengths. Refs. ll76lll4l[|l72ll found that productive assembly 
around the polymer can occur for protein-protein interaction strengths and protein concentrations for which empty capsid assem- 
bly does not occur, as observed for many ssRNA viruses. The polymer-protein interactions stabilize protein-protein interactions, 
lowering the nucleation barrier and enhancing the thermodynamic favorability of the assembled capsid. Ref. lf76ll presented the 
assembly yields and assembly morphologies at long (but finite) times in a phase diagram. These observables can be experimen- 
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FIG. 22. Two mechanisms for assembly around a polymer ll76ll . (A) Strong subunit-subunit interactions and relatively weak subunit-polymer 
interactions led to a nucleation and growth mechanism, where first a small partial capsid formed on the polymer followed by sequential addition 
of subunits. (B) Weaker subunit-subunit interactions and stronger subunit-polymer interactions led to a disordered assembly mechanism, where 
more than 20 subunits (the size of a complete capsid) bound to the polymer in a disordered arrangement, followed by annealing of multiple 
intermediates and finally completion. Figure reprinted with permission from Phys. Biol., 7, 045003 (2010), Elrad and Hagan, Encapsulation 
of a polymer by an icosahedral virus. Copyright (2010) lOP Publishing. 
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FIG. 23. Snapshots from a simulation in Ref. Il72ll . in which assembly proceeded by the disordered assembly mechanism. Images reprinted 
with permission from J. Chem. Phys., 136, 135101 (2012), Langevin dynamics simulation of polymer-assisted virus-like assembly, Mahalik 
and Muthukumar, Copyright(2012) American Institute of Physics. 



tally characterized using EM. As shown in Fig. |20] there is a region of parameter space in which assembly is 'successful' meaning 
that the predominate assembly product is a well-formed capsid which completely encapsulates the polymer For stronger than 
optimal subunit-subunit or subunit-polymer interaction strengths the system becomes trapped in metastable disordered states. 
Longer than optimal polymer lengths can give rise to closed but defective capsids or partially complete capsids. Polymer lengths 
that were significantly larger than optimal gave rise to structures in which multiple nearly complete capsids were connected by 
an RNA molecule. Polymers on the order of twice the optimal length gave rise to doublet capsids (Fig. |20]right), while longer 
polymers gave rise to higher order multiplet capsids. 

Interestingly, structures with similar morphologies were observed in in vitro experiments in which CCMV capsid proteins 
assembled around RNA molecules with lengths in excess of the viral genome length ll43ll or conjugated polyelectrolytes iIstIi . 
For example. Fig. |2T| shows virus-like particles assembled around TMV RNA with 6400 nucleotides (nt), or approximately 
twice the number of nucleotides of the native CCMV RNA typically encapsulated within a single virion. These images can 
be compared to the right-most image labeled 'uncontained' in Fig. |20] The fact that the capsids were connected by RNA was 
confirmed by showing that the multiplets separated upon introduction of RNAase. RNA lengths of 9000 or 12000 nt respectively 
led to triplet and quadruplet capsids ll43ll . 

Assembly mechanisms. The simulations in Ref. lf76ll also demonstrated that there are two classes of assembly mechanisms 
that can occur around a central core (i.e. RNA or a polymer), each of which leads to a different assembly kinetics (Fig. |22] |. One 
closely resembles the nucleation and growth mechanism by which empty capsids assemble, except that the polymer plays an 
active role by stabilizing protein-protein interactions and by enhancing the flux of proteins to the assembling capsid (Fig. |22K). 
A small partial capsid first nucleates on the polymer, followed by a growth phase in which one or a few subunits sequentially 
and reversibly add to the partial capsid. Polymer encapsulation proceeds in concert with capsid assembly. In the alternative 
mechanism, first proposed by McPherson III8III . subunits adsorb onto the polymer en masse in a disordered fashion and then 
cooperatively rearrange to form an ordered capsid (Fig. EB)- In many cases excess subunits adsorb onto the polymer, so that 
subunit dissociation is required for assembly to complete. A similar mechanism was observed by Mahalik and Muthukamar 
Ii7l(Fig.|23l). 

The simulation results predict that the assembly mechanism can be tuned experimentally by changing charge densities, solu- 
tion conditions, or assembly protocols. The nucleation and growth mechanism is favored when protein-polymer association is 
weak or slow (e.g. at high salt concentration) and protein-protein interactions are strong. The disordered assembly mechanism 
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arises when protein-polymer association is strong and rapid, so that complete coverage of the polymer occurs before signifi- 
cant rearrangemen ts of already adsorbed proteins can lead to assembly. In support of this idea, the in vitro kinetics for CCMV 
described in Ref. il27ll appear consistent with the nucleation and growth mechanism, whereas the assembly protocol used by 
Cadena-Nava and coworkers ||43ll favored the disordered assembly mechanism. 

Assembly kinetics. The two mechanisms described above give rise to very different dependencies of assembly kinetics on 
control parameters such as protein concentration or interaction strengths. In the case of the nucleation and growth mechanism, the 
assembly time can be described in terms of the timescale for each phase, Tnuc and Teiong respectively (Eq. [TsT i as for empty capsids. 
The assembly kinetics depend on the stoichiometric ratio of total polymer concentration to capsid protein concentration pj, 
r = Nppj /pj, with different relationships in the limits of excess protein, excess polymer, or stoichiometric amounts of protein 
in polymer 

For brevity we analyze only one case here, the case of excess capsid protein, 7-^1. The other stoichiometric limits can be 
understood via similar arguments. In this case the capsid protein concentration remains nearly constant during the course of 
assembly. Recall (section llll A II that concentrations of partial capsids which are smaller than the critical nucleus size rinuc 
can be treated as in quasi-equilibrium, and the nucleation rate is given by the rate at which subunits associate with the largest 
pre-nucleus, which has rtnuc — 1 subunits. In the limit r ^ 1 the concentration of pre-nuclei adsorbed on polymers can be 
expressed as 

p„„„^_i = ppiVp(piwo)"""~' exp(-/3G,w-i) (27) 

with Gn„„c-i '^he interaction energy for the U nnc — 1-sized complex, including protein-polymer interactions. The latter were 
found to be crucial in both Ref. ||76I1 and Ref. lll72ll . as they effectively enhance the protein concentration in the vicinity of the 
polymer and thus lower the nucleation barrier The factor PpNj,, with pp the concentration of polymers which are not covered 
by capsids and A'p the polymer length accounts for the fact that the concentration of pre-nuclei adsorbed on polymers must 
be proportional to the number of available adsorption sites and hence is proportional to the total concentration of segments on 
uncovered polymers. The nucleation rate is then given by the rate of one additional subunit adding to the pre-nucleus complex, 
fpi where / is a rate constant, to yield 

dpp _ 
dt ~^P^™= 

Tnuc = fNpivoPir- exp(-/3G„„„,_i). (28) 

This equation is first order in polymer concentration, and t he tim escale for depleting polymers is given by Tnuc- The inverse 
dependence of Tnuc on polymer length was observed in Ref. lll4lll . 

As in the case of empty capsids, there is a second timescale Teiong describing the time required for a nucleated partial capsid 
to grow to completion. Hu and Shklovskii |120| proposed that the polymer could substantially decrease the elongation time 
in comparison to the case of empty capsid assembly because the polymer acts as an 'antenna' which increases the collisional 
cross-section. In analogy to transcription factors searching for their binding sites on DNA ||30[ ll I8ll adsorbed subunits can 
undergo one-dimensional diffusion, or sliding, on the poly mer It was predicted that the elongation time would thus decrease 
with increasing polymer length (for fixed capsid size) lll20ll as 

Teiong -iVp"'/V- (29) 

Here we have assumed that subunits add independently during elongation and thus Te^ng is inversely proportional to the subunit 
concentration according to the arguments in section llll A II The simulations in Refs. f?^ found that the polymer did significantly 
increase assembly rates (Fig. |24K). through a combination of subunit sliding and cooperative subunit-polymer motions, where 
the polymer drags associated subunits to the partial capsid like the action of a fly-fishing line with a hooked fish (or the C ooki e 
Monster bringing cookies to his mouth). Enhanced elongation rates were also observed by Mahalik and Muthukumar [|l72ll . 
However, neither study was able to consider a wide enough range of polymer lengths to investigate the scaling predicted by 
Eq.|29] 

Combining Eqs.|28]and|29]gives the total assembly time r = Tnuc + Teiong as 

T ~ iVp- Vr"""^ exp(/3G„„„_i) + N-'^^p^\ (30) 

Notice that the nucleation and elongation times have very different dependencies on the free subunit concentration pi. Therefore, 
the concentration dependence of the overall assembly time t depends on which of these processes is rate limiting. As shown in 
Fig. |24K. median assembly times measured by simulation show a crossover from nucleation limited behavior at low concentra- 
tions to elongation limited behavior at large concentrations. As anticipated, the elongation time scales as Teiong ^ Pi^ whereas 
the nucleation time scales with a larger, concentration dependent power (see section HII Al l. Notice that the elongation-limited 
regime observed at high subunit concentrations would not occur for empty capsid assembly, where elongation limiting behavior 
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polymer length 



FIG. 24. Assembly times depend on subunit concentration and polymer length in simulations of assembly around a polymer (A) The median 
overall assembly time r (A symbol), nucleation time r„uc (□ symbol), and elongation time rdong (+ symbol) are shown as a function of subunit 
density for the model in Fig. [18] Simulations at the lowest concentration used forward flux sampling fllsll to overcome the large nucleation 
barrier. (B) The median elongation time is shown as a function of polymer length for several values of the polymer- subunit interaction strength, 
Ecp- The plot in (B) is reprinted with permission from Phys. Biol., 7, 045003 (2010), Elrad and Hagan, Encapsulation of a polymer by an 
icosahedral virus. Copyright (2010) lOP Publishing. 



would lead to the free subunit starvation kinetic trap (section [III A II Eq. (fT9l)). That trap is avoided when capsid protein is in 
excess over polymers. However, the trap can occur for stoichiometric amounts of polymer and protein or excess polymer. 

The simulation s in R ef. Il72ll find trends which are qualitatively similar to those found in Ref. iItsIi altho ugh the kinetics are 
analyzed in Ref. Il72ll according to the framework of polymer crystallization kinetics theory. In Ref. lll72ll parallel tempering 
was used to calculate the free energy profile as a function of partial capsid size, which showed that the presence of the polymer 
dramatically reduced the nucleation barrier. The polymer also reduced elongation times. 



V. OUTLOOK 

In this review we have tried to present a summary of the theoretical and computational methodologies that have been used to 
model capsid assembly. We have presented examples in which modeling led to important insights which either explained existing 
experiments or suggested new ones. We have also tried to identify limitations in computational power, model construction, or 
available experimental technology which hinder the connection between theory and experiment. Clearly, as the powers of 
computers and the computational techniques performed on them increase, it will become possible to model the assembly process 
with increasing resolution and to account for features of capsid proteins and other virion components not accounted for by 
current models. 

Advances in experimental technologies may provide the most important opportunities for progress in modeling. Recent 
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experiments that visualize or otherwise monitor the formation of individual viruses (e.g. ll24l llOSl Il32i l285ll ) can provide 
information about distributions of capsid growth times and the rates at which individual subunits associate to or dissociate from 
partial capsids of different sizes. This information would provide additional constraints with which to refine or extend current 
models. At the same time, the structures of most intermediates on assembly pathways remain challenging to characterize, and 
thus models which can predict pathways provide an important complement to these new methods. 

The modeling efforts covered in this review have primarily focused on in vitro experimental systems for the obvious reasons 
of reduced complexity and increased control over system parameters offered by the test tube. Looking ahead, it will be important 
for models to include effects relevant to the environment of host organisms, such as molecular crowding, compartmentalization, 
and coupling between translation and assembly. However, quantitative data from in vivo experiments is essential for building 
such models, estimating their parameters, and for testing model predictions. Ultimately, by combining complementary in vivo 
experiments, controlled in vitro experiments in which key parameters can be tuned, and modeling that can elucidate individual 
assembly pathways and the effect of individual parameters, we can identify the factors that confer robustness or sensitivity to the 
process of virus assembly. 
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